Lewati ke konten
Kembali ke Blog

Cara Import CSV ke MySQL: Load Data Infile dan Workbench dengan Parsing Otomatis

· · 9 menit baca

Import data dari CSV adalah task yang sering dilakukan dalam migrasi atau data integration. Setelah mengimport puluhan juta rows dari berbagai sumber, saya akan berbagi metode yang paling efisien dan reliable.

Persiapan CSV

1. Format CSV yang Ideal

id,name,email,created_at
1,John Doe,[email protected],2024-01-15
2,Jane Smith,[email protected],2024-01-16
3,Bob Johnson,[email protected],2024-01-17

2. Pre-processing CSV

# Check encoding
file -i data.csv

Convert ke UTF-8 jika perlu

iconv -f ISO-8859-1 -t UTF-8 data.csv > data_utf8.csv

Remove BOM (Byte Order Mark)

sed '1s/^\xEF\xBB\xBF//' data_utf8.csv > data_clean.csv

Check line endings

dos2unix data_clean.csv

Validate CSV structure

head -5 data_clean.csv wc -l data_clean.csv

3. Handle Special Characters

# Escape quotes
sed 's/"//g' data.csv > data_noquotes.csv

Atau properly escape

awk -F, '{gsub(/"/,"\"\""); print}' data.csv > data_escaped.csv

Handle newlines dalam fields

Use proper CSV parser seperti csvkit

csvformat -U 1 data.csv > data_formatted.csv

Metode 1: LOAD DATA INFILE (Fastest)

1. Enable File Permissions

-- Cek current setting
SHOW VARIABLES LIKE 'secure_file_priv';

-- Jika hasil /var/lib/mysql-files/ -- Copy CSV ke directory tersebut

# Copy CSV ke MySQL data directory
sudo cp data.csv /var/lib/mysql-files/
sudo chown mysql:mysql /var/lib/mysql-files/data.csv

2. Create Table

CREATE TABLE customers_import (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at DATE,
    INDEX idx_email (email),
    INDEX idx_created (created_at)
) ENGINE=InnoDB;

3. Basic LOAD DATA

LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(id, name, email, created_at);

4. Advanced LOAD DATA dengan Transformasi

LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
CHARACTER SET utf8mb4
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
    id = @id,
    name = TRIM(@name),
    email = LOWER(TRIM(@email)),
    created_at = STR_TO_DATE(@created_at, '%Y-%m-%d');

5. LOAD DATA dengan Validasi

LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
    id = @id,
    name = NULLIF(TRIM(@name), ''),
    email = CASE 
        WHEN @email REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$' 
        THEN LOWER(TRIM(@email)) 
        ELSE NULL 
    END,
    created_at = IF(@created_at = '', NULL, STR_TO_DATE(@created_at, '%Y-%m-%d'));

6. LOAD DATA untuk Large Files (Chunked)

-- Bagi file besar jadi chunks
-- Import chunk 1 (rows 1-1000000)
LOAD DATA INFILE '/var/lib/mysql-files/data_chunk1.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

-- Import chunk 2 (rows 1000001-2000000) LOAD DATA INFILE '/var/lib/mysql-files/data_chunk2.csv' INTO TABLE customers_import FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS;

7. LOAD DATA dari Remote Client (Local)

# Dari client machine, upload file via MySQL client
mysql -u root -p -e "
LOAD DATA LOCAL INFILE '/path/to/local/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
"

Metode 2: MySQL Workbench

1. Table Data Import Wizard

Steps:
1. Connect ke database
2. Right-click schema → Table Data Import Wizard
3. Select CSV file
4. Configure:
   - Destination table (existing atau new)
   - Encoding: utf8mb4
   - Field separator: ,
   - Line separator: \n
   - Enclose strings: "
   - Escape character: \
   - First line is header: ✓
5. Column mapping
6. Review dan Import

2. Import dengan Stored Procedure

DELIMITER $$

CREATE PROCEDURE ImportCSVData( IN p_file_path VARCHAR(255), IN p_table_name VARCHAR(64) ) BEGIN SET @sql = CONCAT(' LOAD DATA INFILE ''', p_file_path, ''' INTO TABLE ', p_table_name, ' FIELDS TERMINATED BY '','' OPTIONALLY ENCLOSED BY ''"'' LINES TERMINATED BY ''\n'' IGNORE 1 ROWS ');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

END$$

DELIMITER ;

-- Usage
CALL ImportCSVData('/var/lib/mysql-files/data.csv', 'customers_import');

Metode 3: Python Script dengan mysql-connector

1. Basic Python Import

import csv
import mysql.connector
from mysql.connector import Error

def import_csv_to_mysql(csv_file, table_name): try:

Connect ke database

    conn = mysql.connector.connect(
        host='localhost',
        database='your_database',
        user='your_username',
        password='your_password'
    )

    cursor = conn.cursor()

    # Read CSV
    with open(csv_file, 'r', encoding='utf-8') as file:
        csv_reader = csv.DictReader(file)

        # Prepare INSERT statement
        columns = csv_reader.fieldnames
        placeholders = ', '.join(['%s'] * len(columns))
        query = f"INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({placeholders})"

        # Batch insert untuk performance
        batch_size = 1000
        batch = []

        for row in csv_reader:
            values = tuple(row.values())
            batch.append(values)

            if len(batch) >= batch_size:
                cursor.executemany(query, batch)
                conn.commit()
                print(f"Inserted {cursor.rowcount} rows")
                batch = []

        # Insert remaining rows
        if batch:
            cursor.executemany(query, batch)
            conn.commit()
            print(f"Inserted {cursor.rowcount} rows")

    print(f"Import completed: {csv_file}")

except Error as e:
    print(f"Error: {e}")
finally:
    if conn.is_connected():
        cursor.close()
        conn.close()

Usage

import_csv_to_mysql('data.csv', 'customers_import')

2. Advanced Python Import dengan Validasi

import csv
import mysql.connector
import re
from datetime import datetime

class CSVImporter: def init(self, host, database, user, password): self.conn = mysql.connector.connect( host=host, database=database, user=user, password=password ) self.cursor = self.conn.cursor() self.stats = { 'imported': 0, 'skipped': 0, 'errors': 0 }

def validate_email(self, email):
    pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'
    return re.match(pattern, email) is not None

def validate_date(self, date_str, fmt='%Y-%m-%d'):
    try:
        datetime.strptime(date_str, fmt)
        return True
    except ValueError:
        return False

def transform_row(self, row):
    """Transform dan validate data"""
    transformed = {}

    # ID
    try:
        transformed['id'] = int(row['id'])
    except (ValueError, KeyError):
        return None, "Invalid ID"

    # Name
    name = row.get('name', '').strip()
    if not name:
        return None, "Empty name"
    transformed['name'] = name[:100]  # Truncate

    # Email
    email = row.get('email', '').strip().lower()
    if not self.validate_email(email):
        return None, f"Invalid email: {email}"
    transformed['email'] = email

    # Date
    date_str = row.get('created_at', '')
    if not self.validate_date(date_str):
        return None, f"Invalid date: {date_str}"
    transformed['created_at'] = date_str

    return transformed, None

def import_file(self, csv_file, table_name, batch_size=1000):
    with open(csv_file, 'r', encoding='utf-8') as file:
        reader = csv.DictReader(file)

        query = f"INSERT INTO {table_name} (id, name, email, created_at) VALUES (%s, %s, %s, %s)"
        batch = []

        for row_num, row in enumerate(reader, start=2):  # start=2 karena header row 1
            transformed, error = self.transform_row(row)

            if error:
                print(f"Row {row_num}: {error}")
                self.stats['skipped'] += 1
                continue

            batch.append((
                transformed['id'],
                transformed['name'],
                transformed['email'],
                transformed['created_at']
            ))

            if len(batch) >= batch_size:
                self._insert_batch(query, batch)
                batch = []

        if batch:
            self._insert_batch(query, batch)

    self._print_stats()

def _insert_batch(self, query, batch):
    try:
        self.cursor.executemany(query, batch)
        self.conn.commit()
        self.stats['imported'] += len(batch)
        print(f"Inserted batch of {len(batch)} rows")
    except Exception as e:
        self.conn.rollback()
        self.stats['errors'] += len(batch)
        print(f"Batch insert error: {e}")

def _print_stats(self):
    print("\n=== Import Statistics ===")
    print(f"Imported: {self.stats['imported']}")
    print(f"Skipped: {self.stats['skipped']}")
    print(f"Errors: {self.stats['errors']}")

def close(self):
    self.cursor.close()
    self.conn.close()

Usage

importer = CSVImporter('localhost', 'mydb', 'user', 'pass')
importer.import_file('data.csv', 'customers_import')
importer.close()

Metode 4: mysqlimport Command Line

# Format CSV harus sesuai nama tabel
# Tabel: customers_import → File: customers_import.txt

Convert CSV ke format mysqlimport

(Tab-separated, no header)

csvformat -T data.csv > customers_import.txt

Import

mysqlimport --local \ --user=root \ --password \ --host=localhost \ --fields-terminated-by='\t' \ --lines-terminated-by='\n' \ mydb \ customers_import.txt

Handling Import Errors

1. Duplicate Key Errors

-- Use IGNORE untuk skip duplicates
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
IGNORE INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

-- Atau REPLACE untuk update existing LOAD DATA INFILE '/var/lib/mysql-files/data.csv' REPLACE INTO TABLE customers_import FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS;

2. Data Type Mismatch

-- Handle dengan SET clause
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE customers_import
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@id, @name, @email, @created_at)
SET
    id = NULLIF(@id, ''),
    name = NULLIF(TRIM(@name), ''),
    email = NULLIF(TRIM(@email), ''),
    created_at = IF(@created_at = '', NULL, STR_TO_DATE(@created_at, '%Y-%m-%d'));

3. Large File Performance

-- Disable indexes selama import
ALTER TABLE customers_import DISABLE KEYS;

-- Import data LOAD DATA INFILE ...

-- Re-enable indexes ALTER TABLE customers_import ENABLE KEYS;

-- Atau untuk InnoDB SET GLOBAL innodb_flush_log_at_trx_commit = 2; -- Import SET GLOBAL innodb_flush_log_at_trx_commit = 1;

Best Practices

  1. Always backup sebelum bulk import
  2. Test dengan sample data dulu
  3. Use transactions untuk data integrity
  4. Monitor server resources (CPU, memory, disk)
  5. Import selama low-traffic hours
  6. Validate data sebelum dan setelah import
  7. Use staging table untuk complex transformations

Kesimpulan

Pilih metode berdasarkan kebutuhan:
LOAD DATA INFILE: Tercepat untuk large datasets
MySQL Workbench: GUI-friendly untuk small-medium data
Python Script: Most flexible dengan custom validation
mysqlimport: Command line convenience

Dengan preparation yang tepat, import CSV bisa smooth dan reliable.

Ditulis oleh

Hendra Wijaya

Tinggalkan Komentar

Email tidak akan ditampilkan.