ABOUT ME

-

  • 파이썬 크롤링(Crawling) 연습 - BeautifulSoup Documentation #2 (find_all, find, select 등)
    코딩 연습/코딩배우기 2020. 11. 8. 10:25

     

    ■ 파이썬 크롤링 BeautifulSoup Documentation 내용 정리 #2

    find_all(), find_all() 및 find()와 같은 메서드들, CSS selector 이용하는 select()와 select_one(), 파스 트리(Parse Tree) 내용 수정, get_text(), Encodings 등 내용 정리

    ### find_all()

    ## find_all()은 태그명, 속성, 문자열(텍스트) 또는 이들의 조합을 기준으로 사용할 수 있음
    from bs4 import BeautifulSoup
    import re
    
    html_doc = '''
    <html><head><title>The Dormouse's story</title></head>
    <body>
    <p class="title"><b>The Dormouse's story</b></p>
    <p class="story">Once upon a time there were three little sisters; and their names were
    <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>
    
    <p class="story">...</p>
    '''
    
    soup = BeautifulSoup(html_doc, 'html.parser')
    
    print(soup.find_all('b'))  ## 문서 내 모든 b태그 찾기
    # [<b>The Dormouse's story</b>]
    
    ## 정규표현식 활용
    for tag in soup.find_all(re.compile('^b')):  ## b로 시작하는 모든 태그
        print(tag.name)
        
    #body
    #b
        
    for tag in soup.find_all(re.compile('t')):  ## t가 들어간 모든 태그
        print(tag.name)
        
    #html
    #title
        
    print(soup.find_all(['a', 'b']))  ## list로 전달, 모든 a태그, b태그 찾기
    #[<b>The Dormouse's story</b>, <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, \
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, \
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    
    ## find_all()에 속성이 있는지 체크하는 함수를 인자 전달하기
    ## class 속성은 있고, id 속성은 없는 경우
    
    print(soup.p.has_attr('class'))
    # True
    
    print(soup.p.has_attr('id'))
    # False
    
    def has_class_but_no_id(tag):
        return tag.has_attr('class') and not tag.has_attr('id')
    
    print(soup.find_all(has_class_but_no_id))  ## p 태그 찾아냄
    #[<p class="title"><b>The Dormouse's story</b></p>, 
    # <p class="story">Once upon a time there were three little sisters; and their names were
    #<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
    #<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
    #<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
    #and they lived at the bottom of a well.</p>, <p class="story">...</p>]
    
    
    ## 특정 속성을 함수에 전달 시 그 인자는 태그가 아닌 속성 값이 됨
    def not_lacie(href):
        return href and re.compile('lacie').search(href)  ## re 객체로 href에서 'lacie' 찾기
    
    print(soup.find_all(href = not_lacie))  ## 속성 href의 값으로 함수 전달
    # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    
    
    def not_lacie(href):
        return href and not re.compile('lacie').search(href)
    
    print(soup.find_all(href = not_lacie))  ## 속성 href의 값으로 함수 전달
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    
    ## find_all() 상세히 살펴보기
    ## find_all(name, attrs, recursive, string, limit, ** kwargs)
    
    print(soup.find_all('title'))  ## 모든 title 태그, string은 무시
    # [<title>The Dormouse's story</title>]
    
    print(soup.find_all('p', 'title'))  ## 모든 p태그 중 class가 title인 것
    # [<p class="title"><b>The Dormouse's story</b></p>]
    
    print(soup.find_all('a'))
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    print(soup.find_all(id = 'link2'))  ## id가 link2
    # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    
    ## 각 태그의 href 속성 값이 정규표현식과 일치하는 태그 
    print(soup.find_all(href=re.compile("elsie")))  
    # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    
    print(soup.find_all(id=True))  ## id 속성이 있는 모든 태그 
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    
    print(soup.find_all(href = re.compile('elsie'), id = 'link1'))
    # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    
    
    
    ## HTML5에서 키워드 인자 이름으로 사용할 수 없는 속성 : data-* attributes
    data_soup = BeautifulSoup('<div data-foo="value">foo!</div>', 'html.parser')
    #print(data_soup.find_all(data-foo="value"))  ## 에러
    # SyntaxError: keyword can't be an expression
    
    ## data-* 는  dict 형태로 속성 attrs의 값으로 전달하면 사용 가능
    print(data_soup.find_all(attrs = {'data-foo':'value'}))
    # [<div data-foo="value">foo!</div>]
    
    ## HTML의 name 요소에 대해서도 위와 동일한 방법으로 찾을 수 있음
    name_soup = BeautifulSoup('<input name="email"/>', 'html.parser')
    print(name_soup.find_all(name="email"))
    # []
    print(name_soup.find_all(attrs={"name": "email"}))
    # [<input name="email"/>]
    
    
    print(soup.find_all('p', attrs={'class': 'title'}))
    # [<p class="title"><b>The Dormouse's story</b></p>]
    print(soup.find_all('p', {'class': 'title'}))
    # [<p class="title"><b>The Dormouse's story</b></p>]
    
    print(soup.find_all('a', class_='sister'))  ## find_all('a', {'class':'sister'})
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    print(soup.find_all(class_=re.compile("itl")))  ## class값 문자열 'itl' 포함 태그
    # [<p class="title"><b>The Dormouse's story</b></p>]
    
    
    ## class값으로 함수 전달 (class가 None이 아니고 길이는 6)
    def has_six_characters(css_class):
        return css_class is not None and len(css_class) == 6
    
    print(soup.find_all(class_=has_six_characters))
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    
    ## 다중값에 대해
    css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser')
    print(css_soup.find_all('p', class_='strikeout'))
    # [<p class="body strikeout"></p>]
    
    print(css_soup.find_all('p', class_='body'))
    # [<p class="body strikeout"></p>]
    
    print(css_soup.find_all('p', class_="body strikeout"))
    # [<p class="body strikeout"></p>]
    
    print(css_soup.find_all('p', class_="strikeout body"))  ## 순서 틀림
    # []
    
    
    ## CSS selector 사용 시 가능
    print(css_soup.select('p.strikeout.body'))
    # [<p class="body strikeout"></p>]
    print(css_soup.select('p.body.strikeout'))
    # [<p class="body strikeout"></p>]
    print(css_soup.select('p.strikeout'))
    # [<p class="body strikeout"></p>]
    print(css_soup.select('p.body'))
    # [<p class="body strikeout"></p>]
    
    
    
    ### 태그 대신에 string argument로 찾기 
    ## find_all(name, attrs, recursive, string, limit, ** kwargs)
    print(soup.find_all(string='Elsie'))
    # ['Elsie']
    
    print(soup.find_all(string=['Tillie', 'Elsie', 'Lacie']))
    # ['Elsie', 'Lacie', 'Tillie']
    
    print(soup.find_all(string=re.compile('Dormouse')))
    # ["The Dormouse's story", "The Dormouse's story"]
    
    print(soup.find_all('a', string='Elsie'))## BeautifulSoup 4.4.0 이후
    # [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>]
    
    print(soup.find_all('a', text='Elsie'))  ## BeautifulSoup 4.4.0 이전에서 text로 사용
    # [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>]
    
    
    ### limit argument 사용하기 (SQL의 LIMIT 키워드처럼 작동)
    print(soup.find_all('a', limit = 2))
    # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    
    ### recursive argument 사용하기
    ## find_all()이 하위 모든 태그를 검색하는 것을, 직계 자식만 검색하도록 할 때 recursive=False 사용
    html_tag = '''
    <html>
     <head>
      <title>The Dormouse's story</title>
     </head>
     <body>
      <div>
       <p>test</p>
      </div>
     </body>
    </html>'''
    
    soup = BeautifulSoup(html_tag, 'html.parser')
    print(soup.html.find_all('title'))
    # [<title>The Dormouse's story</title>]
    print(soup.html.find_all('title', recursive = False))  ## html의 직속 자식이 아님
    # []
    
    print(soup.body.find_all('p'))
    # [<p>test</p>]
    print(soup.body.find_all('p', recursive = False))
    # []
    

     

    good4me.co.kr

     

    ### find_all() 및 find()와 같은 메서드들...

    ## find_parents(name, attrs, string, limit, **kwargs)
    ## find_parent(name, attrs, string, **kwargs)
    
    from bs4 import BeautifulSoup
    import re
    
    html_doc = '''
    <html><head><title>The Dormouse's story</title></head>
    <body>
    <p class="title"><b>The Dormouse's story</b></p>
    <p class="story">Once upon a time there were three little sisters; and their names were
    <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>
    
    <p class="story">...</p>
    '''
    
    soup = BeautifulSoup(html_doc, 'html.parser')
    
    a_string = soup.find(string = 'Lacie')
    print(a_string)
    # Lacie
    
    print(a_string.find_parents('a'))  ## string Lacie 부모  a 태그들, list
    # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    
    print(a_string.find_parent('p'))  ## string Lacie 부모 태그 p
    #<p class="story">Once upon a time there were three little sisters; and their names were
    #<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
    #<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
    #<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
    #and they lived at the bottom of a well.</p>
    
    
    ## find_next_siblings() and find_next_sibling()
    
    first_link = soup.a
    print(first_link)
    # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
    
    print(first_link.find_next_siblings('a'))
    #[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    #<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    print(first_link.find_next_sibling('a'))
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
    
    first_story_paragraph = soup.find('p', 'story')
    print(first_story_paragraph.find_next_sibling('p'))
    # <p class="story">...</p>
    
    
    ## find_previous_siblings() and find_previous_sibling()
    ## find_all_next() and find_next()
    ## find_all_previous() and find_previous()

     

    ### CSS selectors

    ## select(), select_one() 메서드
    
    from bs4 import BeautifulSoup
    import re
    
    html_doc = '''
    <html><head><title>The Dormouse's story</title></head>
    <body>
    <p class="title"><b>The Dormouse's story</b></p>
    <p class="story">Once upon a time there were three little sisters; and their names were
    <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>
    
    <p class="story">...</p>
    '''
    
    soup = BeautifulSoup(html_doc, 'html.parser')
    
    print(soup.select('title'))  ## title 태그, 리스트 반환
    # [<title>The Dormouse's story</title>]
    
    print(soup.select('p:nth-of-type(3)'))  ## p태그 동일 레벨 3번째
    # [<p class="story">...</p>]
    
    print(soup.select('body a'))  ## body태그 하위 a태그들
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    print(soup.select('html head title'))  # html 하위 head 하위 title 태그들
    # [<title>The Dormouse's story</title>]
    
    print(soup.select('head > title'))  # head태그 직계(바로 밑) title태그들
    # [<title>The Dormouse's story</title>]
    
    print(soup.select('p > a:nth-of-type(2)'))  ## p태그 직계 a태그 2번째
    # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    
    print(soup.select('p > #link1'))  ## p태그 직계 id 속성값 link1
    # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    
    print(soup.select('body > a'))  ## body태그 바로 밑 a태그 --> 없음, 빈 리스트 반환
    # []
    
    
    ## 동위(형제) 선택자 ( '+' 또는 '~' )
    ## 선택자A + 선택자B : 선택자A 바로 뒤에 위치하는 선택자B 선택
    ## 선택자A ~ 선택자B : 선택자A 뒤에 위치하는 선택자B 선택
    
    print(soup.select('#link1 + .sister'))  ## id link1 바로 뒤에 있는 class sister
    # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    
    print(soup.select('#link1 ~ .sister'))  ## id link1 뒤에 있는 class sister
    #[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    #<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    print(soup.select('.sister'))  ## class이 sister인 태그들
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    ## 선택자[속성~=값] : 특정 값이 포함된 속성을 가진 요소를 찾아 스타일 적용
    print(soup.select('[class ~= sister]'))
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    
    print(soup.select('#link1'))  ## id가 link1인 태그
    # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    
    print(soup.select("a#link2"))  ## a태그이고 id가 link2
    # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    
    print(soup.select('#link1, #link2'))  ## id가 link1이거나 link2인 태그들
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    
    print(soup.select('a[href]'))  ## a태그에 속성 href가 있는 태그들
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    print(soup.select('a[href="http://example.com/elsie"]'))  ## a태그에 href 속성값이 있는 태
    # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
    
    print(soup.select('a[href$="tillie"]'))  ## a태그 href 속성값이 tillie로 시작하는 태그
    # [<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    print(soup.select('a[href*=".com/el"]'))  ## a태그 href 속성값에 .com/el가 포함된 태그
    # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    
    
    ### select_one() 알아보기 : 첫번째 매칭되는 태그 찾기
    print(soup.select_one('.sister'))
    # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
    

     

    ### 파스 트리(Parse Tree) 내용 수정

    soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser')
    b_tag = soup.b
    
    b_tag.name = "blockquote"  ## 태그명 수정
    b_tag['class'] = 'verybold'  ## class 속성 추가
    b_tag['id'] = 1  ## id 속성 추가
    print(b_tag)
    # <blockquote class="verybold" id="1">Extremely bold</blockquote>>
    
    del b_tag['class']
    del b_tag['id']
    print(b_tag)
    # <blockquote>Extremely bold</blockquote>
    
    print(b_tag.string)
    b_tag.string = 'New Bold'
    print(b_tag)
    # <blockquote>New Bold</blockquote>
    
    
    b_tag.append('...')  ## 내용 추가, NavigableString() 인자 전달해도 가능
    print(b_tag)
    # <blockquote>New Bold...</blockquote>
    
    
    b_tag.extend([' and', ' Boldest'])  ## 리스트 추가는 extend()
    print(b_tag)
    # <blockquote>New Bold... and Boldest</blockquote>
    
    print(b_tag.contents)
    # ['New Bold', '...', ' and', ' Boldest']
    
    
    ### 새로운 태그 추가 : new_tag()
    soup = BeautifulSoup('<b></b>', 'html.parser')
    original_tag = soup.b
    new_tag = soup.new_tag('a', href = 'http://example.com')
    original_tag.append(new_tag)
    print(original_tag)
    # <b><a href="http://example.com"></a></b>
    
    
    ### 태그 내용(contents) 지우기 : clear()
    markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, 'html.parser')
    soup.a.clear()
    print(soup.a)
    # <a href="http://example.com/"></a>
    
    
    ### extract() : 특정 태그나 string 추출하기
    ## 추출된 태그와 내용을 다시 추가해보기 - new_tag(), append() 
    markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
    soup = BeautifulSoup(markup, 'html.parser')
    
    print(soup.a)
    # <a href="http://example.com/">I linked to <i>example.com</i></a>
    print(soup.i.parent)
    # <a href="http://example.com/">I linked to <i>example.com</i></a>
    
    i_tag = soup.i.extract()
    
    print(soup.a)
    # <a href="http://example.com/">I linked to </a>
    
    print(i_tag)
    # <i>example.com</i>
    
    print(i_tag.parent)
    # None
    new_tag = soup.new_tag('i', class_ = 'i_cls')
    soup.a.append(new_tag)
    print(soup.a)
    # <a href="http://example.com/">I linked to <i class_="i_cls"></i></a>
    
    soup.i.append('example.com')  ## string 추가 순서에 주의
    print(soup.a)
    # <a href="http://example.com/">I linked to <i class_="i_cls">example.com</i></a>
    
    ## decompose() : 태그와 string 완전히 제거하기
    i_tag = soup.i.decompose()
    print(soup.a)
    # <a href="http://example.com/">I linked to </a>
    
    print(i_tag)
    # None
    
    
    ## insert() : 순서를 정해 태그에 string 추가하기
    tag = soup.a
    tag.insert(0, 'You, ')
    print(soup.a)
    # <a href="http://example.com/">You, I linked to </a>
    
    tag.insert(2, '...')
    print(soup.a)
    # <a href="http://example.com/">You, I linked to ...</a>
    
    i_tag = soup.new_tag('i')  ## i태그 생성
    i_tag.string = 'example.com'  ## i태그 string 추가
    soup.a.append(i_tag)  ## i태그를 a태그에 추가
    print(soup.a)
    
    
    
    ### replace_with() : 페이지 요소를 대체시키기
    new_tag = soup.new_tag('b')
    new_tag.string = 'example.net'
    soup.a.i.replace_with(new_tag)  ## i태그를 b태그로 대체 replace_with()
    print(soup.a)
    # <a href="http://example.com/">You, I linked to ...<b>example.net</b></a>
    
    
    ### wrap() : 지정 태그 요소를 랩핑하기
    soup = BeautifulSoup('<p>I wish I was bold.</p>', 'html.parser')
    soup.p.string.wrap(soup.new_tag('b'))  ## string을 b태그로 랩팽하기
    print(soup)
    # <p><b>I wish I was bold.</b></p>
    
    soup.p.wrap(soup.new_tag('div'))  ## p태그를 div태그로 랩핑하기
    print(soup)
    # <div><p><b>I wish I was bold.</b></p></div>

     

    ### get_text() :  human-readable text

    ## get_text()는 문서 또는 태그 아래의 모든 텍스트를 단일 유니코드 문자열로 반환
    markup = '<a href="http://example.com/">\nI linked to <i>example.com</i>\n</a>'
    soup = BeautifulSoup(markup, 'html.parser')
    print(soup.get_text())
    # \nI linked to example.com\n
    
    print(soup.get_text('|'))  ## 추출 텍스트를 '|'로 분리
    '\nI linked to |example.com|\n'
    
    print(soup.get_text('|', strip = True))  ## 추출 텍스트를 '|'로 분리, 좌우 공백&개행 제거
    # I linked to|example.com
    
    print(soup.i.get_text())
    # example.com
    
    str_strip = [text for text in soup.stripped_strings]
    print(str_strip)
    # ['I linked to', 'example.com']

     

    ### Encodings 인코딩

    • Beautiful Soup는 ASCII 또는 UTF-8 인코딩으로 작성된 HTML이나 XML을 Unicode 변환하기 위해 Unicode, Dammit 라이브러리 사용
    • BeautifulSoup 객체 속성인 .original_encoding을 이용하여 인코딩 자동 감지함
    • 문서의 인코딩을 알면 BeautifulSoup 생성자에게 from_encoding 속성으로 전달 가능
    • 정확한 인코딩을 모르고 Unicode, Dammit 라이브러리가 잘못된 인코딩을 한다면, 그 인코딩을 exclude_encodings 속성으로 제외 가능
    markup = "<b>\N{SNOWMAN}</b>"
    soup = BeautifulSoup(markup, 'html.parser')
    print(soup.b)
    # <b>☃</b>
    print(soup.encode("utf-8"))
    # b'<b>\xe2\x98\x83</b>'
    print(soup.decode("utf-8"))
    #<b>
    # ☃
    #</b>
    
    markup = "<h1>Sacr\xc3\xa9 bleu!</h1>"
    soup = BeautifulSoup(markup, 'html.parser')
    print(soup.h1)
    # <h1>Sacré bleu!</h1>
    print(soup.decode('utf-8'))
    #<h1>
    # Sacré bleu!
    #</h1>
    print(soup.original_encoding)
    # None

     

    [참고] Beautiful Soup 4.9.0 documentation https://www.crummy.com/software/BeautifulSoup/bs4/doc/

     

Designed by goodthings4me.