Robots.txt adalah file yang memberitahu search engine crawler halaman mana yang boleh dan tidak boleh di-crawl. Penting untuk crawl budget optimization dan privacy.

Apa itu Robots.txt?

Location: yoursite.com/robots.txt
Purpose: Guide search engine crawlers
Format: Plain text file
Standard: Robots Exclusion Protocol

Syntax Dasar

# Comment line
User-agent: *
Disallow: /private/
Allow: /public/
Sitemap: https://yoursite.com/sitemap.xml

Directives

Directive	Function
User-agent	Target crawler
Disallow	Block path
Allow	Permit path (overrides Disallow)
Sitemap	Sitemap location
Crawl-delay	Wait between requests

Contoh Robots.txt

Basic (Allow All)

User-agent: *
Disallow:

Sitemap: https://yoursite.com/sitemap.xml

Block Specific Folder

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/

Sitemap: https://yoursite.com/sitemap.xml

Block All Crawlers

User-agent: *
Disallow: /

Different Rules per Bot

User-agent: Googlebot
Disallow: /nogoogle/

User-agent: Bingbot
Disallow: /nobing/

User-agent: *
Disallow: /private/

WordPress Standard

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/

Sitemap: https://yoursite.com/sitemap_index.xml

E-commerce

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=

Sitemap: https://yoursite.com/sitemap.xml

Pattern Matching

Wildcards

# Block all PDF files
User-agent: *
Disallow: /*.pdf$

# Block all query strings
Disallow: /*?

# Block specific parameter
Disallow: /*?ref=

End of URL ($)

# Only block .pdf files
Disallow: /*.pdf$

# This blocks /file.pdf
# But allows /file.pdf/page

Common Mistakes

❌ Blocking CSS/JS (hurts rendering)
❌ Blocking images (hurts image SEO)
❌ Typos in syntax
❌ Wrong file location
❌ Using noindex in robots.txt (doesn't work)
❌ Blocking sitemap

Correct Approach

# Allow CSS and JS for rendering
User-agent: *
Allow: /wp-includes/js/
Allow: /wp-content/themes/
Allow: /wp-content/plugins/
Disallow: /wp-admin/

Testing Robots.txt

Google Search Console

Settings > robots.txt Tester
Enter URL to test
Check if blocked/allowed

Screaming Frog

Configuration > robots.txt
Test custom robots.txt
See blocked URLs

Manual Check

curl https://yoursite.com/robots.txt

Robots.txt vs Noindex

Robots.txt:
- Controls crawling
- Doesn't prevent indexing
- File-based

Noindex:
- Controls indexing
- Page still crawled
- Meta tag/header

Best Practice:
- Use robots.txt for crawl efficiency
- Use noindex to prevent indexing
- Don't block pages you want noindexed

Important Notes

Blocking Doesn’t Mean Private

Warning:
Robots.txt is public
Anyone can read it
Not a security measure

For sensitive content:
- Password protection
- Server-side auth
- Not just robots.txt

Blocked Pages Can Still Index

If page has backlinks:
- URL may still appear in search
- Just without description
- Shows "blocked by robots.txt"

To truly prevent indexing:
- Use noindex tag
- Don't block crawling
- Let Google see the noindex

Best Practices Checklist

✓ Place at root domain
✓ Include sitemap location
✓ Test before deploying
✓ Don't block important resources
✓ Use for crawl efficiency
✓ Regular review and update
✗ Don't rely for security
✗ Don't block then expect noindex

Kesimpulan

Robots.txt adalah tool untuk mengontrol crawling, bukan indexing atau security. Gunakan dengan bijak untuk crawl budget optimization dan guide crawler ke content yang penting.

Cara Membuat Robots.txt yang Benar untuk SEO