U o a(!@sddlmZmZmZddlmZmZddlmZm Z ddl Z ddl Z ddl m Z ddlmZmZmZmZddlmZdd lmZdd lmZzdd lmZWnek reZYnXed d eDZedd eDZedd eDZeeddgBZdZej rJeddkr&e!ddks*t"e #edde$ddZ%n e #eZ%e&ddddddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4d5g Z'e #d6Z(iZ)Gd7d8d8e*Z+d9d:Z,Gd;d<dd>e-Z.Gd?d@d@e/Z0GdAdBdBe*Z1GdCdDdDe*Z2dEdFZ3dS)G)absolute_importdivisionunicode_literals) text_type binary_type) http_clienturllibN) webencodings)EOFspaceCharacters asciiLettersasciiUppercase)_ReparseException)_utils)StringIO)BytesIOcCsg|]}|dqSasciiencode.0itemrbC:\Users\vtejo\AppData\Local\Temp\pip-unpacked-wheel-6mt8ur68\pip\_vendor\html5lib\_inputstream.py srcCsg|]}|dqSrrrrrrrscCsg|]}|dqSrrrrrrrs>.)sumr#r%rrrr,^szBufferedStream._bufferedBytescCs<|j|}|j||jdd7<t||jd<|Sr')r"r4r#appendr$r()r%r3datarrrr1as   zBufferedStream._readStreamcCs|}g}|jd}|jd}|t|jkr|dkr|dks>t|j|}|t||krl|}|||g|_n"t||}|t|g|_|d7}|||||||8}d}q|r|||d|S)Nrr )r$r(r#r-r7r1join)r%r3remainingBytesrv bufferIndex bufferOffset bufferedData bytesToReadrrrr2hs&     zBufferedStream._readFromBufferN) __name__ __module__ __qualname____doc__r&r+r0r4r,r1r2rrrrr!9s  r!cKst|tjs(t|tjjr.t|jtjr.d}n&t|drJt|dt }n t|t }|rdd|D}|rvt d|t |f|St |f|SdS)NFr4rcSsg|]}|dr|qS) _encoding)endswith)rxrrrrs z#HTMLInputStream..z3Cannot set an encoding with a unicode input, set %r) isinstancer HTTPResponserresponseaddbasefphasattrr4r TypeErrorHTMLUnicodeInputStreamHTMLBinaryInputStream)sourcekwargs isUnicode encodingsrrrHTMLInputStreams       rUc@speZdZdZdZddZddZddZd d Zd d Z d dZ dddZ ddZ ddZ dddZddZdS)rOProvides a unicode stream of characters to the HTMLTokenizer. This class takes care of character encoding and removing or replacing incorrect byte-sequences and also provides column and line tracking. i(cCsZtjsd|_ntddkr$|j|_n|j|_dg|_tddf|_| ||_ | dS)Initialises the HTMLInputStream. HTMLInputStream(source, [encoding]) -> Normalized stream from source for use by html5lib. source can be either a file-object, local filename or a string. The optional encoding parameter must be a string that indicates the encoding. If specified, that encoding will be used, regardless of any BOM or later declaration (such as in a meta element) Nu􏿿r rutf-8certain) rsupports_lone_surrogatesreportCharacterErrorsr(characterErrorsUCS4characterErrorsUCS2newLineslookupEncoding charEncoding openStream dataStreamreset)r%rQrrrr&s   zHTMLUnicodeInputStream.__init__cCs.d|_d|_d|_g|_d|_d|_d|_dS)Nr)r* chunkSize chunkOffseterrors prevNumLines prevNumCols_bufferedCharacterr6rrrrcszHTMLUnicodeInputStream.resetcCst|dr|}nt|}|SzvProduces a file object from source. source can be either a file object, local filename or a string. r4)rMrr%rQr"rrrras z!HTMLUnicodeInputStream.openStreamcCsT|j}|dd|}|j|}|dd|}|dkr@|j|}n ||d}||fS)N rrr )r*countrhrfindri)r%r.r*nLines positionLine lastLinePospositionColumnrrr _positions   z HTMLUnicodeInputStream._positioncCs||j\}}|d|fS)z:Returns (line, col) of the current position in the stream.r )rtrf)r%linecolrrrr$szHTMLUnicodeInputStream.positioncCs6|j|jkr|stS|j}|j|}|d|_|S)zo Read one character from the stream or queue if available. Return EOF when EOF is reached. r )rfre readChunkr r*)r%rfcharrrrrxs   zHTMLUnicodeInputStream.charNcCs|dkr|j}||j\|_|_d|_d|_d|_|j|}|j rX|j |}d|_ n|s`dSt |dkrt |d}|dksd|krdkrnn|d|_ |dd}|j r| || d d }| d d }||_t ||_d S) NrdrFr r iz rm T)_defaultChunkSizertrerhrir*rfrbr4rjr(ordr[replace)r%rer8lastvrrrrws0           z HTMLUnicodeInputStream.readChunkcCs(ttt|D]}|jdqdS)Ninvalid-codepoint)ranger(invalid_unicode_refindallrgr7)r%r8_rrrr\%sz*HTMLUnicodeInputStream.characterErrorsUCS4cCsd}t|D]}|rqt|}|}t|||drrt|||d}|tkrl|j dd}q|dkr|dkr|t |dkr|j dqd}|j dqdS)NFrTrzir ) rfinditerr}groupstartrisSurrogatePairsurrogatePairToCodepointnon_bmp_invalid_codepointsrgr7r()r%r8skipmatch codepointr)char_valrrrr])s"  z*HTMLUnicodeInputStream.characterErrorsUCS2Fc Cszt||f}Wnhtk rx|D]}t|dks$tq$ddd|D}|sZd|}td|}t||f<YnXg}||j|j }|dkr|j |j krqn0| }||j kr| |j|j |||_ q| |j|j d| s~qq~d|} | S)z Returns a string of characters from the stream up to but not including any character in 'characters' or EOF. 'characters' must be a container that supports the 'in' method and iteration over its characters. rdcSsg|]}dt|qS)z\x%02x)r})rcrrrrNsz5HTMLUnicodeInputStream.charsUntil..z^%sz[%s]+N)charsUntilRegExKeyErrorr}r-r:recompilerr*rfreendr7rw) r% charactersoppositecharsrregexr<mrrrrr charsUntil@s0    z!HTMLUnicodeInputStream.charsUntilcCsT|dk rP|jdkr.||j|_|jd7_n"|jd8_|j|j|ksPtdSr')rfr*rer-)r%rxrrrungetos   zHTMLUnicodeInputStream.unget)N)F)rArBrCrDr|r&rcrartr$rxrwr\r]rrrrrrrOs   & /rOc@sLeZdZdZdddZddZd d Zdd d Zd dZddZ ddZ dS)rPrVN windows-1252TcCsn|||_t||jd|_d|_||_||_||_||_ ||_ | ||_ |j ddk sbt |dS)rWidrN)ra rawStreamrOr& numBytesMetanumBytesChardetoverride_encodingtransport_encodingsame_origin_parent_encodinglikely_encodingdefault_encodingdetermineEncodingr`r-rc)r%rQrrrrr useChardetrrrr&s  zHTMLBinaryInputStream.__init__cCs&|jdj|jd|_t|dS)Nrr~)r` codec_info streamreaderrrbrOrcr6rrrrcszHTMLBinaryInputStream.resetcCsDt|dr|}nt|}z||Wnt|}YnX|Srk)rMrr0r+r!rlrrrras z HTMLBinaryInputStream.openStreamcCs|df}|ddk r|St|jdf}|ddk r:|St|jdf}|ddk rX|S|df}|ddk rt|St|jdf}|ddk r|djds|St|jdf}|ddk r|S|rpzddl m }Wnt k rYnXg}|}|j s<|j |j}t|tst|s&q<||||q|t|jd}|j d|dk rp|dfSt|jdf}|ddk r|StddfS)NrYr tentativezutf-16)UniversalDetectorencodingr) detectBOMr_rrdetectEncodingMetarname startswithr%pip._vendor.chardet.universaldetectorr ImportErrordonerr4rrHr3r-r7feedcloseresultr0r)r%chardetr`rbuffersdetectorr#rrrrrsR           z'HTMLBinaryInputStream.determineEncodingcCs|jddkstt|}|dkr&dS|jdkrFtd}|dk stnT||jdkrf|jddf|_n4|jd|df|_|td|jd|fdS)Nr rYutf-16beutf-16lerXrzEncoding changed from %s to %s)r`r-r_rrr0rcr)r% newEncodingrrrchangeEncodings   z$HTMLBinaryInputStream.changeEncodingc Cstjdtjdtjdtjdtjdi}|jd}t|t s"rPc@seZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ e e e Z ddZe eZefddZddZddZddZdS) EncodingByteszString-like object with an associated position and various extra methods If the position is ever greater than the string length then an exception is raisedcCst|tstt||SN)rHr3r-__new__lowerr%valuerrrrLszEncodingBytes.__new__cCs d|_dS)Nr)rtrrrrr&PszEncodingBytes.__init__cCs|Srrr6rrr__iter__TszEncodingBytes.__iter__cCs>|jd}|_|t|kr"tn |dkr.t|||dS)Nr rrtr( StopIterationrNr%prrr__next__Ws  zEncodingBytes.__next__cCs|Sr)rr6rrrnext_szEncodingBytes.nextcCsB|j}|t|krtn |dkr$t|d|_}|||dSr'rrrrrpreviouscs zEncodingBytes.previouscCs|jt|krt||_dSrrtr(r)r%r$rrr setPositionlszEncodingBytes.setPositioncCs*|jt|krt|jdkr"|jSdSdS)Nrrr6rrr getPositionqs  zEncodingBytes.getPositioncCs||j|jdSNr )r$r6rrrgetCurrentByte{szEncodingBytes.getCurrentBytecCsH|j}|t|kr>|||d}||kr4||_|S|d7}q||_dS)zSkip past a list of charactersr Nr$r(rtr%rrrrrrrs  zEncodingBytes.skipcCsH|j}|t|kr>|||d}||kr4||_|S|d7}q||_dSrrrrrr skipUntils  zEncodingBytes.skipUntilcCs>|j}|||t|}||}|r:|jt|7_|S)zLook for a sequence of bytes at the start of a string. If the bytes are found return True and advance the position to the byte after the match. Otherwise return False and leave the position alone)r$r(r)r%r3rr8r<rrr matchBytess  zEncodingBytes.matchBytescCsR||jd|}|dkrJ|jdkr,d|_|j|t|d7_dStdS)zLook for the next sequence of bytes matching a given sequence. If a match is found advance the position to the last byte of the matchNrrr T)r$findrtr(r)r%r3 newPositionrrrjumpTos zEncodingBytes.jumpToN)rArBrCrDrr&rrrrrrpropertyr$r currentBytespaceCharactersBytesrrrrrrrrrHs      rc@sXeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ dS)rz?Mini parser for detecting character encoding from meta elementscCst||_d|_dS)z3string - the data to work on for encoding detectionN)rr8rr%r8rrrr&s zEncodingParser.__init__c Csd|jfd|jfd|jfd|jfd|jfd|jff}|jD]Z}d}|D]D\}}|j|rFz|}WqWqFtk rd}YqYqFXqF|s:qq:|jS) Nsr8rr6rrrrszEncodingParser.handleCommentcCs|jjtkrdSd}d}|}|dkr,dS|ddkr\|ddk}|r|dk r||_dSq|ddkr|d}t|}|dk r||_dSq|ddkrtt|d}|}|dk rt|}|dk r|r||_dS|}qdS) NTFrs http-equivr s content-typecharsetscontent) r8rr getAttributerr_ContentAttrParserrparse)r% hasPragmapendingEncodingattrtentativeEncodingcodec contentParserrrrrs8      zEncodingParser.handleMetacCs |dS)NF)handlePossibleTagr6rrrrsz%EncodingParser.handlePossibleStartTagcCst|j|dS)NT)rr8rr6rrrrs z#EncodingParser.handlePossibleEndTagcCsb|j}|jtkr(|r$||dS|t}|dkrD|n|}|dk r^|}qLdS)NTr)r8rasciiLettersBytesrrrspacesAngleBracketsr)r%endTagr8rrrrrrs    z EncodingParser.handlePossibleTagcCs |jdS)Nrrr6rrrrszEncodingParser.handleOthercCs|j}|ttdgB}|dks2t|dks2t|dkr>dSg}g}|dkrV|rVqnX|tkrj|}qnD|dkrd|dfS|tkr|| n|dkrdS||t |}qF|dkr| d|dfSt ||}|dkrJ|}t |}||kr"t |d|d|fS|tkr<|| q||qnJ|d krbd|dfS|tkr||| n|dkrdS||t |}|t krd|d|fS|tkr|| n|dkrdS||qdS) z_Return a name,value pair for the next attribute in the stream, if one is found, or None/Nr )rN=)rrr9)'"r) r8rr frozensetr(r-r:asciiUppercaseBytesr7rrrr)r%r8rattrName attrValue quoteCharrrrrsb             zEncodingParser.getAttributeN) rArBrCrDr&rrrrrrrrrrrrrs$rc@seZdZddZddZdS)rcCst|tst||_dSr)rHr3r-r8rrrrr&fszContentAttrParser.__init__cCsz|jd|jjd7_|j|jjdkss     "   JgIh6'