본문 바로가기
코딩 연습/코딩배우기

파이썬 크롤링(Crawling) - 셀레니움(Selenium) 연습 #3

by good4me 2020. 11. 12.

goodthings4me.tistory.com

 

CGV 영화 리뷰 스크래핑

from selenium import webdriver
import time

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')

def get_movie_reviews(url, page_num = 10):
    wd = webdriver.Chrome('C:/Temp/chromedriver.exe', options = chrome_options)
    wd.get(url)
    
    writer_list = []
    review_list = []
    date_list = []
    movie_review = []
    
    for page_no in range(1, page_num + 1):
        page_ul = wd.find_element_by_id('paging_point')  ## page번호 있는 ul태그
        page_a = page_ul.find_element_by_link_text(str(page_no))  ## 텍스트로 되어있는 링크 찾기
        #print(page_a)
        page_a.click()  ## page number 클릭
        time.sleep(1)
        
        writers = wd.find_elements_by_class_name('writer-name')
        writer_list += [writer.text for writer in writers]
        
        reviews = wd.find_elements_by_class_name('box-comment')
        review_list += [review.text for review in reviews]
        
        dates = wd.find_elements_by_class_name('day')
        date_list += [date.text for date in dates]
        
    movie_review = list(zip(writer_list, review_list, date_list))
    
    return movie_review
        

url = 'http://www.cgv.co.kr/movies/detail-view/?midx=83815'  ## 도굴
movie_review = get_movie_reviews(url, 2)
print(movie_review)

[실행 결과]

[('유정나', '이제훈 콧날에 집중 안됨 ㅠㅠㅠ 존잼이긴 함 ㅠㅠㅠ', '2020.11.12'), ('지아짱?', '코로나로 인해 극장나들이 정말 오래만에 해서 기뻤어요 적당히 재밋고 잔잔한 웃음이 있네요~^^', '2020.11.12'), ('s.s', '조선왕릉선릉대박도굴꾼대박', '2020.11.12'), ('ki**s1144', '오락성이지만 비교육적 결말에 헛된 꿈이 들게한다 쉽게사는 인생사의 고통이 뭍혀버렸다', '2020.11.12'), ('수정', '간만에 집중ㆍ 시간가는줄 모르고 영화봤어요 강추강추', '2020.11.12'), ('db**sqja', '마스크 쓴 무대인사라 아쉽', '2020.11.12'), ('paulhan99', '무난무난한 한국형 오락영화', '2020.11.12'), ('qk**ndud7632', '재밌었어요! 권선징악이네요.', '2020.11.12'), ('or**sv7986', '킬링타임용으로 보면 될듯해요~', '2020.11.12'), ('ls**1300', '쿨잼오랜만에웃으면서봤네요', '2020.11.12'), ('mini7208', '재치있는 영화 칭찬합니다~', '2020.11.12'), ('엘모띠', '다시볼겁니다. 중간에 잠듬', '2020.11.12')]

 

good4me.co.kr

 

※ 추출 부분의 for문을 수정하여,

def get_cgv_movie_reviews(url, page_num = 10):
    wd = webdriver.Chrome('C:/Temp/chromedriver.exe', options = chrome_options)
    wd.get(url)
    movie_review = []
    
    for page_no in range(1, page_num + 1):
        page_ul = wd.find_element_by_id('paging_point')  ## page번호 있는 ul태그
        page_a = page_ul.find_element_by_link_text(str(page_no))  ## 텍스트로 되어있는 링크 찾기
        #print(page_a)
        page_a.click()  ## page number 클릭
        time.sleep(1)
        
        review_ul = wd.find_element_by_id('movie_point_list_container')
        
        for li in review_ul.find_elements_by_tag_name('li'):
            review = {}
            try:
                for a in li.find_elements_by_class_name('commentMore'):
                    if a.text:
                        writer_name = a.text
                        
                if li.find_element_by_class_name('day').text:
                    day = li.find_element_by_class_name('day').text
                    
                if li.find_element_by_class_name('box-comment').text:
                    comment = li.find_element_by_class_name('box-comment').text
            except:
                continue
            
            review['writer_name'] = writer_name
            review['day'] = day
            review['comment'] = comment
            movie_review.append(review)
            #print(writer_name, day, comment)
            
    return movie_review


url = 'http://www.cgv.co.kr/movies/detail-view/?midx=83825'  ## 내가 죽던 날
movie_review = get_cgv_movie_reviews(url, 2)

for review in movie_review:
    print(f'{review["writer_name"]} / {review["day"]} / {review["comment"]}')
    

[실행 결과]

gm**77 / 2020.11.12 / 좋았어요… 역시 김혜수
까는맛 / 2020.11.12 / 느린 호흡 느린 흡입력 아쉬운 결말
이마반 / 2020.11.12 / 꼭봐야할 영화 가슴울리는 명작
co**os10000 / 2020.11.12 / 이정은 님의 연기가 정말 좋아요. 출연 배우들 모두 연기 구멍없이 모두들 명연기 펼쳤어요.
na**ol1 / 2020.11.12 / 잘 봤어요.다시 한번 볼 생각입니다
영화좋아 / 2020.11.12 / 영화 내가 죽던날 스토리가 좋았습니다
볼땡이 / 2020.11.12 / 잼나게 잘봤구요. 배우님들 연기 정말 좋아요.
irenesarah / 2020.11.12 / 엄마랑 같이 봤어요 마음이 따뜻해지는 영화예요
엘 / 2020.11.12 / 마음이 따뜻해지는 영화...
졸가리 / 2020.11.12 / 김혜수 이정은이 연기가 너무 너무 좋았어요
정처없이 / 2020.11.12 / 많이~ 음~ 과연~ 이렇게 할 수 있을까?
행복한택시 / 2020.11.12 / 연기는 말할것도 없고 마음 따뜻해져서 좋았습니당 강추합니다

 

[참고] 이수안컴퓨터연구소

 

댓글