Lewati ke konten
Kembali ke Blog

Cara Membuat Robots.txt yang Benar untuk SEO

Β· Β· 3 menit baca

Robots.txt adalah file yang memberitahu search engine crawler halaman mana yang boleh dan tidak boleh di-crawl. Penting untuk crawl budget optimization dan privacy.

Apa itu Robots.txt?

Location: yoursite.com/robots.txt
Purpose: Guide search engine crawlers
Format: Plain text file
Standard: Robots Exclusion Protocol

Syntax Dasar

# Comment line
User-agent: *
Disallow: /private/
Allow: /public/
Sitemap: https://yoursite.com/sitemap.xml

Directives

DirectiveFunction
User-agentTarget crawler
DisallowBlock path
AllowPermit path (overrides Disallow)
SitemapSitemap location
Crawl-delayWait between requests

Contoh Robots.txt

Basic (Allow All)

User-agent: *
Disallow:

Sitemap: https://yoursite.com/sitemap.xml

Block Specific Folder

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/

Sitemap: https://yoursite.com/sitemap.xml

Block All Crawlers

User-agent: *
Disallow: /

Different Rules per Bot

User-agent: Googlebot
Disallow: /nogoogle/

User-agent: Bingbot Disallow: /nobing/

User-agent: * Disallow: /private/

WordPress Standard

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/

Sitemap: https://yoursite.com/sitemap_index.xml

E-commerce

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=

Sitemap: https://yoursite.com/sitemap.xml

Pattern Matching

Wildcards

# Block all PDF files
User-agent: *
Disallow: /*.pdf$

Disallow: /*?

Block specific parameter

Disallow: /*?ref=

End of URL ($)

# Only block .pdf files
Disallow: /*.pdf$

This blocks /file.pdf

But allows /file.pdf/page

Common Mistakes

❌ Blocking CSS/JS (hurts rendering)
❌ Blocking images (hurts image SEO)
❌ Typos in syntax
❌ Wrong file location
❌ Using noindex in robots.txt (doesn't work)
❌ Blocking sitemap

Correct Approach

# Allow CSS and JS for rendering
User-agent: *
Allow: /wp-includes/js/
Allow: /wp-content/themes/
Allow: /wp-content/plugins/
Disallow: /wp-admin/

Testing Robots.txt

Google Search Console

  1. Settings > robots.txt Tester
  2. Enter URL to test
  3. Check if blocked/allowed

Screaming Frog

  1. Configuration > robots.txt
  2. Test custom robots.txt
  3. See blocked URLs

Manual Check

curl https://yoursite.com/robots.txt

Robots.txt vs Noindex

Robots.txt:
- Controls crawling
- Doesn't prevent indexing
- File-based

Noindex:

  • Controls indexing
  • Page still crawled
  • Meta tag/header

Best Practice:

  • Use robots.txt for crawl efficiency
  • Use noindex to prevent indexing
  • Don't block pages you want noindexed

Important Notes

Blocking Doesn’t Mean Private

Warning:
Robots.txt is public
Anyone can read it
Not a security measure

For sensitive content:

  • Password protection
  • Server-side auth
  • Not just robots.txt

Blocked Pages Can Still Index

If page has backlinks:
- URL may still appear in search
- Just without description
- Shows "blocked by robots.txt"

To truly prevent indexing:

  • Use noindex tag
  • Don't block crawling
  • Let Google see the noindex

Best Practices Checklist

βœ“ Place at root domain
βœ“ Include sitemap location
βœ“ Test before deploying
βœ“ Don't block important resources
βœ“ Use for crawl efficiency
βœ“ Regular review and update
βœ— Don't rely for security
βœ— Don't block then expect noindex

Kesimpulan

Robots.txt adalah tool untuk mengontrol crawling, bukan indexing atau security. Gunakan dengan bijak untuk crawl budget optimization dan guide crawler ke content yang penting.

Ditulis oleh

Hendra Wijaya

Tinggalkan Komentar

Email tidak akan ditampilkan.