[mongoimport ] txt cvs파일 mongoDB 대량 insert하기 / type 타입 지정

공공데이터에서 서울 지역의 주소및 정보 txt 파일을 csv로 변환한 후 mongoimport를 사용하여 mongoDB에 밀어 넣기를 해보았다. 처음에는 데이터를 가공해서 넣어야지 했는데 총 60.8만건이어서 매우 막막했다.

먼저 txt -> csv 파일로 변환하기 + 엑셀 표로 만들기 포스팅은 👇

https://blckchainetc.tistory.com/398

txt 파일을 cvs 파일로 바꾸는 방법 / txt, cvs 파일 엑셀 표로 만들기

서울지역의 txt 파일을 mongoDB로 옮기기 위해 txt를 cvs 파일로 바꿨다. 1. txt파일을 Excel로 연다. 2. 다른 이름으로 저장 -> csv utf-8 을 선택한다. 3. 끝~~ 그런데 cvs 파일을 열어보면 표로 되어 있지..

blckchainetc.tistory.com

csv 파일을 표로 만들어놔야 나중에 mongoDB에 각 각 필드당 하나의 값이 제대로 들어간다.

mongoimport 실습

이렇게 파일을 만들어 놓으면 mongoimport 준비가 되었다!

공공데이터에서 받아온 txt파일에는 필드명이 없어서 직접 넣어주었다.

mongoimport는 공식문서가 잘 정리되어있어서 공식문서를 참고하는 것을 추천한다!

mongoimport 실행 명령어 공식

mongoimport <options> <connection-string> <file>

여기서 options가 굉장히 다양하다.

예제 1)

mongoimport --db=[db-name] --collection=[colletion-name] --type=csv --headerline --file=[csv파일 경로][파일명].csv

해당 db에 위에서 정의한 collection이 없어도 자동으로 데이터가 들어간다.

해당 파일의 경로를 모른다면 터미널에 파일을 갖다 놓으면 파일 경로가 짠 나온다.

--headerline 옵션은 표의 맨 윗줄을 "fields" 로 사용하겠다는 의미이다.

mongoimport field type을 설정하기

위의 방식으로 넣어보니 내가 의도하지않은 (?) 타입들로 자동 정해져서 들어간 걸 확인했다. 그래서 fields type까지 정해주는 명령어로 다시 넣기

mongoimport --db=users --collection=contacts --type=csv --columnsHaveTypes --fields="[field1명].string(), [field2명].boolean(),[field3명].int32(),[field4명].binary(base64)" --file=/example/file.csv

* 필드 타입을 정할 때 중요한 점 2가지 *

1. --columnsHaveTypes 옵션을 쓰면 --headerline 옵션과 함께 쓰일 수 없다. --headerline을 빼준다.

2. headerline이 뺀다는 것은 입력하려는 csv 파일의 첫 줄의 필드명들도 삭제해주어야 한다. 만약 삭제하지 않고 그대로 넣으려고 하면 (나처럼) "필드명" 은 해당 type에 맞지 않는다~ 라는 에러 메세지가 나온다.

Error msg : Failed: type coercion failure in document #0 for column 'buildingMainNumber', could not parse token 'buildingMainNumber' to type int32

필드명을 제외한 csv 파일

그럼 성공 !

60.8만건 데이터가 아주 잘 들어갔다.

휴

reference : https://www.mongodb.com/docs/database-tools/mongoimport/

mongoimport — MongoDB Database Tools

Docs Home → MongoDB Database ToolsThe mongoimport tool imports content from an Extended JSON, CSV, or TSV export created by mongoexport, or potentially, another third-party export tool.Run mongoimport from the system command line, not the mongo shell.See

www.mongodb.com

'mongoDB' 카테고리의 다른 글

[mongoimport] CSV 파일 createdAt 넣기 'createdAt', could not parse token '2022-05-13 17:05:52' to type date_oracle (0)	2022.05.13
[mongoDB] create index $text search 걸어서 특정 단어 검색하기 (0)	2022.04.29
[Mongoose] virtual field만들어서 populate 사용하기 with options (0)	2021.12.06
[mongoDB] Operations (0)	2021.11.26
[MongoDB] Replica set이란? (0)	2021.11.19

GOOD DAY