devugur

Day 2 - Importing data and practicing queries

06 Aug 2025


Today we will learn how to find some structured data and import it into our database.

Generating some dummy data

There are some online tools for generating dummy data for testing purposes. Mockaroo is my favorite.

Let's create some user data and download it as CSV which is one of the most popular ways to store data in a text file.

Make sure your CSV headers match your table column names, or be ready to map them manually. Handle data types carefully - dates, numbers, and text need to be in the right format. Mockaroo has options to create your data in the correct format.

Mockaroo screenshot
Mockaroo screenshot

Preparing docker container

We need to run some sql statement to import the data.

That's great but how am I going to pass the data to docker?

Volume mapping in a nutshell:

It's like creating a shared folder between your computer and the Docker container.

The Problem: Docker containers are isolated - they can't see files on your computer by default.

The Solution: Volume mapping creates a "bridge" so both your computer and the container can access the same files.

We need to re-create our docker container. Luckly we don't have any crucial data to save so we can delete the container and create it with a volume mapping.

# stop container
docker stop travel_mania_container

# delete container
docker rm travel_mania_container

# create a folder for the project and
# for the data to share between the host and container
cd ~ && mkdir travel_mania_project \
&& cd ./travel_mania_project \
&& mkdir container_data

# re-create container with a volume mapping
docker run --name travel_mania_container \
  -e POSTGRES_USER=ugur \
  -e POSTGRES_PASSWORD=ugur1234 \
  -e POSTGRES_DB=travel_mania \
  -p 5432:5432 \
  -v ./container_data:/data \
  -d postgres

After this operation don't forget to re-create the users table.

Importing CSV

You can import your data using SQL statements or using GUI of your SQL client. But what happens if your column names does not match 1:1 with the table definition?

Sample csv
My sample csv does not match with the actual table definition
In that case, we need to map our column names somehow.

Using column order

First option is to rely on column order. The HEADER option tells PostgreSQL to skip the first row, and you're explicitly mapping the column order by delimiting with comma.

-- Your CSV has:   id, email_addr, username, fname,      lname,     creation_time, update_time
-- Your table has: id, email,      username, first_name, last_name, created_at,    updated_at

COPY users(id, email, username, first_name, last_name, created_at, updated_at)
FROM '/data/MOCK_DATA.csv'
DELIMITER ','
CSV HEADER;

Now you can run the COPY statement we wrote above. Here is the result:

Import result
psql select result

Notice how the column names are mapped correctly.

Using a temp table

You can create a temporary table and insert the csv there. Later insert into actual table from that temp table. In this method you don't import into your main table directly and have an opportunity to map columns more granully.

-- Create temp table matching your CSV exactly
CREATE TEMP TABLE temp_users (
  id INTEGER,
  user_email VARCHAR(255),
  username VARCHAR(50),
  fname VARCHAR(100),
  lname VARCHAR(100),
  creation_time TIMESTAMP,
  update_time TIMESTAMP
);

-- Import to temp table
COPY temp_users FROM '/data/MOCK_DATA.csv' DELIMITER ',' CSV HEADER;

-- Insert into your real table with proper mapping
INSERT INTO users
  (id, email, username, first_name, last_name, created_at, updated_at)
SELECT
  id, user_email, username, fname, lname, creation_time, update_time
FROM temp_users;

-- Clean up
DROP TABLE temp_users;

Practicing queries

Question: Find all users whose first name starts with 'J' and last name starts with 'F'. Order by first name ascending.

Solution:

select
	username,
	first_name,
	last_name,
	email
from
	users
where
	first_name like 'J%'
	and last_name like 'F%'
order by
	first_name;

Question: Find users whose username contains numbers and group them by email domain (gmail.com, yahoo.com, etc.).

Solution:

select
	SPLIT_PART(email, '@', 2) as email_domain,
	COUNT(*) as users_with_numbers_in_username
from
	users
where
	username ~ '[0-9]'
	-- Contains at least one number
group by
	SPLIT_PART(email, '@', 2)
having
	COUNT(*) > 0
order by
	users_with_numbers_in_username desc;

End of Day 2

Today we learned how to import large dataset into the database using COPY statement.

Tomorrow we will learn about string functions like SPLIT_PART, LENGTH, TRIM and date functions AGE, DATE_PART, TO_CHAR.