Today we will learn how to find some structured data and import it into our database.
Generating some dummy data
There are some online tools for generating dummy data for testing purposes. Mockaroo is my favorite.
Let's create some user data and download it as CSV which is one of the most popular ways to store data in a text file.
Make sure your CSV headers match your table column names, or be ready to map them manually. Handle data types carefully - dates, numbers, and text need to be in the right format. Mockaroo has options to create your data in the correct format.

Preparing docker container
We need to run some sql statement to import the data.
That's great but how am I going to pass the data to docker?
Volume mapping in a nutshell:
It's like creating a shared folder between your computer and the Docker container.
The Problem: Docker containers are isolated - they can't see files on your computer by default.
The Solution: Volume mapping creates a "bridge" so both your computer and the container can access the same files.
We need to re-create our docker container. Luckly we don't have any crucial data to save so we can delete the container and create it with a volume mapping.
# stop container
docker stop travel_mania_container
# delete container
docker rm travel_mania_container
# create a folder for the project and
# for the data to share between the host and container
cd ~ && mkdir travel_mania_project \
&& cd ./travel_mania_project \
&& mkdir container_data
# re-create container with a volume mapping
docker run --name travel_mania_container \
-e POSTGRES_USER=ugur \
-e POSTGRES_PASSWORD=ugur1234 \
-e POSTGRES_DB=travel_mania \
-p 5432:5432 \
-v ./container_data:/data \
-d postgres
After this operation don't forget to re-create the users table.
Importing CSV
You can import your data using SQL statements or using GUI of your SQL client. But what happens if your column names does not match 1:1 with the table definition?
Using column order
First option is to rely on column order. The HEADER
option tells PostgreSQL to skip the first row, and you're explicitly mapping the column order by delimiting with comma.
-- Your CSV has: id, email_addr, username, fname, lname, creation_time, update_time
-- Your table has: id, email, username, first_name, last_name, created_at, updated_at
COPY users(id, email, username, first_name, last_name, created_at, updated_at)
FROM '/data/MOCK_DATA.csv'
DELIMITER ','
CSV HEADER;
Now you can run the COPY
statement we wrote above. Here is the result:

Notice how the column names are mapped correctly.
Using a temp table
You can create a temporary table and insert the csv there. Later insert into actual table from that temp table. In this method you don't import into your main table directly and have an opportunity to map columns more granully.
-- Create temp table matching your CSV exactly
CREATE TEMP TABLE temp_users (
id INTEGER,
user_email VARCHAR(255),
username VARCHAR(50),
fname VARCHAR(100),
lname VARCHAR(100),
creation_time TIMESTAMP,
update_time TIMESTAMP
);
-- Import to temp table
COPY temp_users FROM '/data/MOCK_DATA.csv' DELIMITER ',' CSV HEADER;
-- Insert into your real table with proper mapping
INSERT INTO users
(id, email, username, first_name, last_name, created_at, updated_at)
SELECT
id, user_email, username, fname, lname, creation_time, update_time
FROM temp_users;
-- Clean up
DROP TABLE temp_users;
Practicing queries
Question: Find all users whose first name starts with 'J' and last name starts with 'F'. Order by first name ascending.
Solution:
select
username,
first_name,
last_name,
email
from
users
where
first_name like 'J%'
and last_name like 'F%'
order by
first_name;
Question: Find users whose username contains numbers and group them by email domain (gmail.com, yahoo.com, etc.).
Solution:
select
SPLIT_PART(email, '@', 2) as email_domain,
COUNT(*) as users_with_numbers_in_username
from
users
where
username ~ '[0-9]'
-- Contains at least one number
group by
SPLIT_PART(email, '@', 2)
having
COUNT(*) > 0
order by
users_with_numbers_in_username desc;
End of Day 2
Today we learned how to import large dataset into the database using COPY
statement.
Tomorrow we will learn about string functions like SPLIT_PART, LENGTH, TRIM
and date functions AGE, DATE_PART, TO_CHAR
.