Import data dari CSV adalah task yang sering dilakukan dalam migrasi atau data integration. Setelah mengimport puluhan juta rows dari berbagai sumber, saya akan berbagi metode yang paling efisien dan reliable.
Persiapan CSV
1. Format CSV yang Ideal
id,name,email,created_at
1,John Doe,[email protected],2024-01-15
2,Jane Smith,[email protected],2024-01-16
3,Bob Johnson,[email protected],2024-01-17
2. Pre-processing CSV
# Check encoding
file -i data.csv
Convert ke UTF-8 jika perlu
iconv -f ISO-8859-1 -t UTF-8 data.csv > data_utf8.csv
Remove BOM (Byte Order Mark)
sed '1s/^\xEF\xBB\xBF//' data_utf8.csv > data_clean.csv
Check line endings
dos2unix data_clean.csv
Validate CSV structure
head -5 data_clean.csv
wc -l data_clean.csv
3. Handle Special Characters
# Escape quotes
sed 's/"//g' data.csv > data_noquotes.csv
Atau properly escape
awk -F, '{gsub(/"/,"\"\""); print}' data.csv > data_escaped.csv
Handle newlines dalam fields
Use proper CSV parser seperti csvkit
csvformat -U 1 data.csv > data_formatted.csv
Metode 1: LOAD DATA INFILE (Fastest)
1. Enable File Permissions
-- Cek current setting
SHOW VARIABLES LIKE 'secure_file_priv';
-- Jika hasil /var/lib/mysql-files/
-- Copy CSV ke directory tersebut
# Copy CSV ke MySQL data directory
sudo cp data.csv /var/lib/mysql-files/
sudo chown mysql:mysql /var/lib/mysql-files/data.csv
2. Create Table
CREATE TABLE customers_import (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
created_at DATE,
INDEX idx_email (email),
INDEX idx_created (created_at)
) ENGINE=InnoDB;
3. Basic LOAD DATA
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(id, name, email, created_at);
4. Advanced LOAD DATA dengan Transformasi
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
CHARACTER SET utf8mb4
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
id = @id,
name = TRIM(@name),
email = LOWER(TRIM(@email)),
created_at = STR_TO_DATE(@created_at, '%Y-%m-%d');
5. LOAD DATA dengan Validasi
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
id = @id,
name = NULLIF(TRIM(@name), ''),
email = CASE
WHEN @email REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
THEN LOWER(TRIM(@email))
ELSE NULL
END,
created_at = IF(@created_at = '', NULL, STR_TO_DATE(@created_at, '%Y-%m-%d'));
6. LOAD DATA untuk Large Files (Chunked)
-- Bagi file besar jadi chunks
-- Import chunk 1 (rows 1-1000000)
LOAD DATA INFILE '/var/lib/mysql-files/data_chunk1.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
-- Import chunk 2 (rows 1000001-2000000)
LOAD DATA INFILE '/var/lib/mysql-files/data_chunk2.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
7. LOAD DATA dari Remote Client (Local)
# Dari client machine, upload file via MySQL client
mysql -u root -p -e "
LOAD DATA LOCAL INFILE '/path/to/local/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
"
Metode 2: MySQL Workbench
1. Table Data Import Wizard
Steps:
1. Connect ke database
2. Right-click schema → Table Data Import Wizard
3. Select CSV file
4. Configure:
- Destination table (existing atau new)
- Encoding: utf8mb4
- Field separator: ,
- Line separator: \n
- Enclose strings: "
- Escape character: \
- First line is header: ✓
5. Column mapping
6. Review dan Import
2. Import dengan Stored Procedure
DELIMITER $$CREATE PROCEDURE ImportCSVData( IN p_file_path VARCHAR(255), IN p_table_name VARCHAR(64) ) BEGIN SET @sql = CONCAT(' LOAD DATA INFILE ''', p_file_path, ''' INTO TABLE ', p_table_name, ' FIELDS TERMINATED BY '','' OPTIONALLY ENCLOSED BY ''"'' LINES TERMINATED BY ''\n'' IGNORE 1 ROWS ');
PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt;END$$
DELIMITER ;
-- Usage
CALL ImportCSVData('/var/lib/mysql-files/data.csv', 'customers_import');Metode 3: Python Script dengan mysql-connector
1. Basic Python Import
import csv import mysql.connector from mysql.connector import Errordef import_csv_to_mysql(csv_file, table_name): try:
Connect ke database
conn = mysql.connector.connect( host='localhost', database='your_database', user='your_username', password='your_password' ) cursor = conn.cursor() # Read CSV with open(csv_file, 'r', encoding='utf-8') as file: csv_reader = csv.DictReader(file) # Prepare INSERT statement columns = csv_reader.fieldnames placeholders = ', '.join(['%s'] * len(columns)) query = f"INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({placeholders})" # Batch insert untuk performance batch_size = 1000 batch = [] for row in csv_reader: values = tuple(row.values()) batch.append(values) if len(batch) >= batch_size: cursor.executemany(query, batch) conn.commit() print(f"Inserted {cursor.rowcount} rows") batch = [] # Insert remaining rows if batch: cursor.executemany(query, batch) conn.commit() print(f"Inserted {cursor.rowcount} rows") print(f"Import completed: {csv_file}") except Error as e: print(f"Error: {e}") finally: if conn.is_connected(): cursor.close() conn.close()Usage
import_csv_to_mysql('data.csv', 'customers_import')
2. Advanced Python Import dengan Validasi
import csv import mysql.connector import re from datetime import datetimeclass CSVImporter: def init(self, host, database, user, password): self.conn = mysql.connector.connect( host=host, database=database, user=user, password=password ) self.cursor = self.conn.cursor() self.stats = { 'imported': 0, 'skipped': 0, 'errors': 0 }
def validate_email(self, email): pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$' return re.match(pattern, email) is not None def validate_date(self, date_str, fmt='%Y-%m-%d'): try: datetime.strptime(date_str, fmt) return True except ValueError: return False def transform_row(self, row): """Transform dan validate data""" transformed = {} # ID try: transformed['id'] = int(row['id']) except (ValueError, KeyError): return None, "Invalid ID" # Name name = row.get('name', '').strip() if not name: return None, "Empty name" transformed['name'] = name[:100] # Truncate # Email email = row.get('email', '').strip().lower() if not self.validate_email(email): return None, f"Invalid email: {email}" transformed['email'] = email # Date date_str = row.get('created_at', '') if not self.validate_date(date_str): return None, f"Invalid date: {date_str}" transformed['created_at'] = date_str return transformed, None def import_file(self, csv_file, table_name, batch_size=1000): with open(csv_file, 'r', encoding='utf-8') as file: reader = csv.DictReader(file) query = f"INSERT INTO {table_name} (id, name, email, created_at) VALUES (%s, %s, %s, %s)" batch = [] for row_num, row in enumerate(reader, start=2): # start=2 karena header row 1 transformed, error = self.transform_row(row) if error: print(f"Row {row_num}: {error}") self.stats['skipped'] += 1 continue batch.append(( transformed['id'], transformed['name'], transformed['email'], transformed['created_at'] )) if len(batch) >= batch_size: self._insert_batch(query, batch) batch = [] if batch: self._insert_batch(query, batch) self._print_stats() def _insert_batch(self, query, batch): try: self.cursor.executemany(query, batch) self.conn.commit() self.stats['imported'] += len(batch) print(f"Inserted batch of {len(batch)} rows") except Exception as e: self.conn.rollback() self.stats['errors'] += len(batch) print(f"Batch insert error: {e}") def _print_stats(self): print("\n=== Import Statistics ===") print(f"Imported: {self.stats['imported']}") print(f"Skipped: {self.stats['skipped']}") print(f"Errors: {self.stats['errors']}") def close(self): self.cursor.close() self.conn.close()Usage
importer = CSVImporter('localhost', 'mydb', 'user', 'pass')
importer.import_file('data.csv', 'customers_import')
importer.close()Metode 4: mysqlimport Command Line
# Format CSV harus sesuai nama tabel # Tabel: customers_import → File: customers_import.txtConvert CSV ke format mysqlimport
(Tab-separated, no header)
csvformat -T data.csv > customers_import.txt
Import
mysqlimport --local \ --user=root \ --password \ --host=localhost \ --fields-terminated-by='\t' \ --lines-terminated-by='\n' \ mydb \ customers_import.txt
Handling Import Errors
1. Duplicate Key Errors
-- Use IGNORE untuk skip duplicates LOAD DATA INFILE '/var/lib/mysql-files/data.csv' IGNORE INTO TABLE customers_import FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS;-- Atau REPLACE untuk update existing LOAD DATA INFILE '/var/lib/mysql-files/data.csv' REPLACE INTO TABLE customers_import FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS;
2. Data Type Mismatch
-- Handle dengan SET clause LOAD DATA INFILE '/var/lib/mysql-files/data.csv' INTO TABLE customers_import FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS (@id, @name, @email, @created_at) SET id = NULLIF(@id, ''), name = NULLIF(TRIM(@name), ''), email = NULLIF(TRIM(@email), ''), created_at = IF(@created_at = '', NULL, STR_TO_DATE(@created_at, '%Y-%m-%d'));3. Large File Performance
-- Disable indexes selama import ALTER TABLE customers_import DISABLE KEYS;-- Import data LOAD DATA INFILE ...
-- Re-enable indexes ALTER TABLE customers_import ENABLE KEYS;
-- Atau untuk InnoDB SET GLOBAL innodb_flush_log_at_trx_commit = 2; -- Import SET GLOBAL innodb_flush_log_at_trx_commit = 1;
Best Practices
- Always backup sebelum bulk import
- Test dengan sample data dulu
- Use transactions untuk data integrity
- Monitor server resources (CPU, memory, disk)
- Import selama low-traffic hours
- Validate data sebelum dan setelah import
- Use staging table untuk complex transformations
Kesimpulan
Pilih metode berdasarkan kebutuhan:
– LOAD DATA INFILE: Tercepat untuk large datasets
– MySQL Workbench: GUI-friendly untuk small-medium data
– Python Script: Most flexible dengan custom validation
– mysqlimport: Command line convenienceDengan preparation yang tepat, import CSV bisa smooth dan reliable.
Ditulis oleh
Hendra Wijaya