MySQL - 자동차 매출 데이터를 이용한 리포트 작성

본 글은 SQL을 공부하기 위해

<SQL로 맛보는 데이터 전처리 분석> 속 예제를 연습하고 기록한 글이다.

매일 배운 내용을 잘 기록해서 제대로 소화하는 게 이 글의 최종 목표이다.

막간 도서 소개

4장

자동차 매출 데이터를 이용한 리포트 작성

1. 데이터 생성하기

https://www.mysqltutorial.org/mysql-sample-database.aspx

MySQL Sample Database

This page provides you with a MySQL sample database that helps you to practice with MySQL effectively and quickly. You can download the sample database and load it into your MySQL Server.

www.mysqltutorial.org

위 링크에 접속해 Download MySQL Sample Database를 다운받는다.

다운로드한 파일은 MySQL Workbench에서 실행한다.

2. 데이터 구조 이해하기

이제 Classicmodels의 테이블 구조를 알아보기 위해

아래와 같이 ERD(Entity Relational Diagram)를 작성한다.

크게 고객, 매출, 상품으로 구성되어 있고

각 테이블 특정 칼럼으로 연결되어 JOIN 함수를 써야 정보를 조합해 데이터를 조회할 수 있다.

3. 구매 지표 추출

1) 매출액 (일자별, 월별, 연도별)

a) 일별 매출액 조회

USE classicmodels;

SELECT A.orderdate
    , SUM(priceeach * quantityordered) as sales
FROM orders A
LEFT JOIN orderdetails B
ON A.ordernumber = B.ordernumber
GROUP BY 1
ORDER BY 1;

필요한 건 주문일자와 판매액 정보이다.

주문일자(orderdate)는 orders 테이블에 존재,

판매액(priceeach * quantityordered)은 orderdetails 테이블에 존재한다.

각 테이블은 ordernumber이라는 특정 열로 연결되고 있어

LEFT JOIN을 ordernumber 열로 테이블을 조합한다.

이후 일별 매출액을 집계하려면

GROUP BY를 통해 주문일자별로 그룹핑한 뒤

집계함수 SUM을 사용해 priceeach * quantityordered의 합으로 구한다.

orderdate	sales
2003-01-06	93
2003-01-09	87
2003-01-10	73
2003-01-29	399
2003-01-31	320
2003-02-11	170
...	...

b) 월별 매출액 조회

orderdate는 'yyyy-mm-dd'의 형태로 구성되어 있다.

이중 'yyyy-mm'만 가져오려면 원하는 문자열로 자르는 SUBSTR() 함수를 사용한다.

-- SUBSTR 함수 예시

-- SUBSTR(컬럼, 위치, 길이)
SELECT SUBSTR('2003-01-04', 1, 7);
-- > '2003-01'

SELECT SUBSTR(A.orderdate, 1, 7) AS MM
	, SUM(priceeach & quantityordered) as SALES
FROM orders A
LEFT JOIN orderdetails B
ON A.ordernumber = B.ordernumber
GROUP BY 1
ORDER BY 1;

일자별 매출액과 코드가 거의 동일하나,

다른 건 SELECT 첫 번째 계산식으로 SUBSTR이 들어가고 별칭으로 MM으로 만든 것이다.

해당 계산식으로 GROUP BY, ORDER BY를 하기 때문에 GROUP BY 1, ORDER BY 1은 그대로 유지한다.

MM	SALES
2003-01	972
2003-02	569
2003-03	915
2003-04	965
2003-05	1136
2003-06	873
...	...

c) 연도별 매출액 조회

SELECT SUBSTR(A.orderdate, 1, 4) as YY
	, SUM(priceeach & quantityordered) as SALES
FROM orders A
LEFT JOIN orderdetails B
ON A.ordernumber = B.ordernumber
GROUP BY 1
ORDER BY 1;

월별 매출액 조회와 동일하지만 SUBSTR의 길이를 '4'로 조정해

orderdate에서 'yyyy' 정보만 추출하도록 수정한다.

그 외 내용은 동일하다.

YY	SALES
2003	18879
2004	25702
2005	10459

2) 구매자 수, 구매 건수(일자별, 월별, 연도별)

a) 일자별 구매자 수, 구매건수 조회

SELECT 
	orderdate
	, COUNT(DISTINCT customernumber) AS 구매자수
    , COUNT(ordernumber) AS 구매건수
FROM orders
GROUP BY 1
ORDER BY 1;

결과예시

orderdate	구매자수	구매건수
2003-01-06	1	1
2003-01-09	1	1
2003-01-10	1	1
2003-01-29	1	1

b) 구매자 수가 2명 이상인 날짜 조회 (서브쿼리)

SELECT A.*
FROM (SELECT 
	orderdate
	, COUNT(DISTINCT customernumber) AS 구매자수
    , COUNT(ordernumber) AS 구매건수
FROM orders
GROUP BY 1
ORDER BY 1
) A
WHERE A.구매자수 >= 2;

여기에서 A는 서브쿼리의 결과 집합을 가리키는 별칭이다.

별칭을 사용해 코드의 가독성을 향상시킨다.

결과예시

orderdate	구매자수	구매건수
2003-05-21	2	2
2003-06-16	2	2
2003-10-06	2	2
2003-10-10	2	2

3) 인당 매출액 AMV (연도별, 월별)

-- 연도별 인당 구매금액(AMV) 구하기 : 매출액 / 구매자 수

SELECT SUBSTR(A.orderdate, 1, 4) as YY
		, COUNT(DISTINCT A.customernumber) AS 구매자수
		, SUM(priceeach * quantityordered) AS SALES
        , SUM(priceeach * quantityordered) / COUNT(DISTINCT A.customernumber) AS AMV
FROM orders A
LEFT JOIN orderdetails B
on A.ordernumber = B.ordernumber
GROUP BY 1
ORDER BY 1;


-- 월별 인당 구매금액 구하기
SELECT SUBSTR(A.orderdate, 1, 7) as MM
		, COUNT(DISTINCT A.customernumber) AS 구매자수
		, SUM(priceeach * quantityordered) AS SALES
        , SUM(priceeach * quantityordered) / COUNT(DISTINCT A.customernumber) AS AMV
FROM orders A
LEFT JOIN orderdetails B
on A.ordernumber = B.ordernumber
GROUP BY 1
ORDER BY 1;

결과 예시(월별)

MM	구매건수	SALES	AMV
2003-01	5	116692.77	23338.554000
2003-02	3	128403.64	42801.213333
2003-03	6	160517.14	26752.856667
2003-04	7	185848.59	26549.798571

4) 건당 매출액 ATV (연도별, 월별)

-- 연도별 건당 구매금액(ATV) 구하기 : 매출액 / 구매건수
-- 1건의 거래가 평균적으로 얼마의 매출을 일으키는가?
SELECT SUBSTR(A.orderdate, 1, 4) as YY
		, COUNT(DISTINCT A.ordernumber) AS 구매건수
		, SUM(priceeach * quantityordered) AS SALES
        , SUM(priceeach * quantityordered) / COUNT(DISTINCT A.ordernumber) AS ATV
FROM orders A
LEFT JOIN orderdetails B
on A.ordernumber = B.ordernumber
GROUP BY 1
ORDER BY 1;

-- 월별 건강 구매금액(ATV) 구하기
SELECT SUBSTR(A.orderdate, 1, 7) as MM
		, COUNT(DISTINCT A.ordernumber) AS 구매건수
		, SUM(priceeach * quantityordered) AS SALES
        , SUM(priceeach * quantityordered) / COUNT(DISTINCT A.ordernumber) AS ATV
FROM orders A
LEFT JOIN orderdetails B
on A.ordernumber = B.ordernumber
GROUP BY 1
ORDER BY 1;

결과 예시(월별)

MM	구매건수	SALES	ATV
2003-01	5	116692.77	23338.554000
2003-02	3	128403.64	42801.213333
2003-03	6	160517.14	26752.856667
2003-04	7	185848.59	26549.798571

3. 그룹별 구매 지표 구하기

1) 국가별 도시별 매출액

SELECT C.country
		, C.city
		, SUM(B.priceeach * B.quantityordered) AS SALES
FROM orders A
LEFT JOIN orderdetails B
on A.ordernumber = B.ordernumber
LEFT JOIN customers C
on A.customernumber = C.customernumber
GROUP BY 1, 2
ORDER BY 1, 2;

결과예시

country	city	SALES
Australia	Chatswood	133907.12
Australia	Glen Waverly	55866.02
Australia	Melbourne	180585.07
Australia	North Sydney	137034.22

2) 북미, 비북미별 매출액

-- 북미, 비북미 구분 (CASE WHEN 조건 THEN 결과 END)
SELECT 
	CASE WHEN country IN ('USA', 'Canada') THEN 'North America'
    ELSE 'Others' END CNTRY
    , country
FROM customers;


-- 북미, 비북미 매출액 
SELECT 
	CASE WHEN COUNTRY IN ('USA', 'Canada') THEN 'North America'
	ELSE 'Others' END CNTRY,
SUM(B.priceeach * B.quantityordered) AS SALES
FROM orders A
LEFT JOIN orderdetails B
on A.ordernumber = B.ordernumber
LEFT JOIN customers C
on A.customernumber = C.customernumber
GROUP BY 1
ORDER BY 1;

결과예시

CNTRY	SALES
North America	3479191.91
Others	6124998.70

3) 매출 Top 5 국가 및 매출

-- 국가별 Top Product 및 매출
CREATE TABLE stat AS
SELECT
	C.country
    , SUM(priceeach * quantityordered) SALES
FROM orders A
LEFT JOIN orderdetails B
ON A.ordernumber = B.ordernumber
LEFT JOIN customers C
ON A.customernumber = C.customernumber
GROUP BY 1
ORDER BY 2 DESC;

-- 서브쿼리 활용
SELECT
	country
    ,  SALES
    , DENSE_RANK() OVER(ORDER BY SALES DESC) RNK
FROM stat;


SELECT *
FROM (SELECT
	country
    ,  SALES
    , DENSE_RANK() OVER(ORDER BY SALES DESC) RNK
FROM stat) TOP
WHERE TOP.RNK <= 5

결과 예시

country	SALES	RNK
USA	3273280.05	1
Spain	1099389.09	2
France	1007374.02	3
Australia	562582.59	4
New Zealand	476847.01	5

4. 재구매율

여기에서 재구매율은 같은 고객이 특정 기간(EX. 2018년)에 구매한 경우를 찾고

이를 기반으로 1년 후 (EX.2019년)에 연달아 구매한 구매자의 비중을 의미한다.

SELECT A.customernumber,
		A.orderdate,
		B.customernumber,
        B.orderdate
FROM orders A
LEFT JOIN orders B
ON A.customernumber = B.customernumber 
AND SUBSTR(A.orderdate, 1, 4) = SUBSTR(B.orderdate, 1, 4)-1;

결과 예시

customernumber	orderdate	customernumber	orderdate
363	2003-01-06	363	2004-11-04
128	2003-01-09	128	2004-03-15
128	2003-01-09	128	2004-11-05
181	2003-01-10	181	2004-04-05

위의 결과를 통해 각 연도별 구매자 수를 집계한다면

ex. 2019년 구매자 수 / 2018년 구매자 수 로 나누어

최종적으로 각 연도의 재구매율을 구할 수 있다.

1) 국가별 2004, 2005 Retention Rate(%)

SELECT C.country
		, SUBSTR(A.orderdate, 1, 4) YY
        , COUNT(DISTINCT A.customernumber) BU_1
        , COUNT(DISTINCT B.customernumber) BU_2
        , COUNT(DISTINCT B.customernumber) / COUNT(DISTINCT A.customernumber) RETENTION_RATE
FROM ORDERS A
LEFT JOIN ORDERS B
ON A.customernumber = B.customernumber
AND SUBSTR(A.orderdate, 1, 4) = SUBSTR(B.orderdate, 1, 4)-1
LEFT JOIN customers C
ON A.customernumber = C.customernumber
GROUP BY 1, 2;

이전 쿼리가 거의 비슷하지만 국가별로 그룹핑해야하기 때문에

고객의 거주 국가 정보를 결합하는 LEFT JOIN 함수가 사용되어야 한다.

결과 예시

country	YY	BU_1	BU_2	RETENTION_RATE
Australia	2003	5	3	0.6000
Australia	2004	3	2	0.6667
Australia	2005	4	0	0.0000
Austria	2003	2	1	0.5000
Austria	2004	1	1	1.0000
Austria	2005	2	0	0.0000

'Database' 카테고리의 다른 글

MySQL - UK Commerce 데이터를 이용한 리포트 작성 (0)	2023.10.12
MySQL - 식품 배송 데이터 분석 (1)	2023.10.10
MySQL - 상품 리뷰 데이터를 이용한 리포트 작성 (1)	2023.10.10
MySQL 기본구문 - 분석을 위한 조건을 정확하게! 필터 역할 'WHERE' (0)	2023.10.10
MySQL 기본구문 - 데이터를 선택하기 위한 SELECT, FROM (0)	2023.10.04

Nayoon's Data Analytics

MySQL - 자동차 매출 데이터를 이용한 리포트 작성

4장

자동차 매출 데이터를 이용한 리포트 작성

1. 데이터 생성하기

2. 데이터 구조 이해하기

3. 구매 지표 추출

1) 매출액 (일자별, 월별, 연도별)

2) 구매자 수, 구매 건수(일자별, 월별, 연도별)

3) 인당 매출액 AMV (연도별, 월별)

4) 건당 매출액 ATV (연도별, 월별)

3. 그룹별 구매 지표 구하기

1) 국가별 도시별 매출액

2) 북미, 비북미별 매출액

3) 매출 Top 5 국가 및 매출

4. 재구매율

1) 국가별 2004, 2005 Retention Rate(%)

'Database' 카테고리의 다른 글

티스토리툴바

MySQL - 자동차 매출 데이터를 이용한 리포트 작성

4장

자동차 매출 데이터를 이용한 리포트 작성

1. 데이터 생성하기

2. 데이터 구조 이해하기

3. 구매 지표 추출

1) 매출액 (일자별, 월별, 연도별)

2) 구매자 수, 구매 건수(일자별, 월별, 연도별)

3) 인당 매출액 AMV (연도별, 월별)

4) 건당 매출액 ATV (연도별, 월별)

3. 그룹별 구매 지표 구하기

1) 국가별 도시별 매출액

2) 북미, 비북미별 매출액

3) 매출 Top 5 국가 및 매출

4. 재구매율

1) 국가별 2004, 2005 Retention Rate(%)

'Database' 카테고리의 다른 글

관련글

티스토리툴바