[크롤링] 할리스 매장 위치 정보 크롤링 해보기 (pandas / del[:])

cs/크롤링

[크롤링] 할리스 매장 위치 정보 크롤링 해보기 (pandas / del[:])

신_이나 2023. 3. 8. 20:15

할리스의 매장 정보를 크롤링하여 워드에 작성해보자!

https://www.hollys.co.kr/store/korea/korStore2.do

할리스

HOLLYS

www.hollys.co.kr

할리스의 매장찾기 페이지 소스를 살펴보니 위와 같은 코드로 정리하고 있었다.

tr 로 매장을 나누어 그 안에 지역, 매장명, 현황, 주소, 매장 서비스, 전화번호로 나누어 코드를 작성하였다.

실제로 청주율량현대점과 연세대학교원주장례식장점처럼 실제로 나누고 있었다.

<코드>

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
import datetime

def hollys(result):
    html = urlopen('https://www.hollys.co.kr/store/korea/korStore2.do')
    bs = BeautifulSoup(html, 'html.parser')
    tag_tbody = bs.find('tbody')
    for store in tag_tbody.find_all('tr'):

        store_td = store.findAll('td')
        store_name = store_td[1].string
        store_sido = store_td[0].string
        store_now = store_td[2].string
        store_address = store_td[3].string
        store_phone = store_td[5].string

        result.append([store_name] + [store_sido] + [store_now] + [store_address] + [store_phone])

def main():
    result = []
    print('Hollys store crawling >>>>>>>>>>>>>>>>>')
    hollys(result)
    hollys_tbl = pd.DataFrame(result, columns= ('store', 'sido-gu', 'now', 'address', 'phone'))
    hollys_tbl.to_csv('/Users/shinjiwon/Desktop/크롤링/hollys.csv', encoding= 'cp949', mode = 'w', index = True)
    del result[:]

최종 코드는 다음과 같다. pandas 를 처음 써보아서 다른 블로그를 참고하여 작성하였다.

main 함수에 있는 쪽은 pandas 작성에 관한 내용이다.

1) 그 중에서 del 은 삭제하는 함수다. del result[:] 는 result 배열 안에 있는 내용을 모두 삭제한다는 이야기다. 첫 번째만 삭제하고 싶다면 result[0] 을 뒤에 두개를 삭제하고 싶다면 result[2:] 를 사용하며 된다.

2) 또한 전까지는 항상 findAll 을 사용하다가 위에 코드는 find_all 을 사용하였는데 둘은 같은 것이고 버전차이라고만 보면 된다.

위 코드에 대한 결과

이때 할리스의 장소 정보는 한 페이지에만 있는 것이 아니란 총 52 페이지에 걸쳐 있기 떄문에 이를 반영하면 range 를 사용하여 범위를 설정하면 된다.

def hollys(result):
#여기서부터 추가
    
    for page in range(1,52) :
        url = 'https://www.hollys.co.kr/store/korea/korStore2.do?pageNo=%d&sido=&gugun=&store=' %page
        html = urlopen(url)
        
#
        bs = BeautifulSoup(html, 'html.parser')
        tag_tbody = bs.find('tbody')
        for store in tag_tbody.find_all('tr'):

            store_td = store.findAll('td')
            store_name = store_td[1].string
            store_sido = store_td[0].string
            store_now = store_td[2].string
            store_address = store_td[3].string
            store_phone = store_td[5].string

            result.append([store_name] + [store_sido] + [store_now] + [store_address] + [store_phone])

참고한 사이트

저작자표시

'cs > 크롤링' 카테고리의 다른 글

[크롤링] selenium 안녕 ,,? (0)	2023.03.20
[크롤링] 네이버 홈화면의 메뉴를 가져와보자 (0)	2023.03.13
[크롤링] 크롤링 시작하기 (페이지에 있는 링크 목록 가져오기/attrs/웹사이트를 무작위로 이동/딥 웹, 다크 웹, 히든 웹/웹스크레이핑/외부Url,내부 Url) (0)	2023.03.06
[크롤링] 고급 HTML 분석 (findAll/get_text()/트리이동/정규표현식/람다표현식) (0)	2023.02.28
[크롤링] 네이버를 시작페이지로 (0)	2022.10.06

현재글[크롤링] 할리스 매장 위치 정보 크롤링 해보기 (pandas / del[:])

서강형런닝맨빅게임, 수시박람회, 미니김빱님감사해요, 하늬가람, Namecheap, 영어회화, 베이글랭귀지클럽, 메타버스, 도메인구입, 나의블로그 화이팅, 블로그시작, 대외활동, BLC, 서강대, 박준서고마워, 빅게임,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

신전떡볶이가 좋아