U v_a~ @sddlmZmZmZddlmZddlmZmZddl Z ddl Z ddl m Z m Z ddlmZddlmZmZmZmZdd lmZdd lmZed d eDZed d eDZedd eDZeeddgBZdZejreddkreddkst e !edde"ddZ#n e !eZ#dddddddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4h Z$e !d5Z%iZ&Gd6d7d7e'Z(d8d9Z)Gd:d;d;e'Z*Gdd?d?e,Z-Gd@dAdAe'Z.GdBdCdCe'Z/dDdEZ0dS)F)absolute_importdivisionunicode_literals) text_type) http_clienturllibN)BytesIOStringIO) webencodings)EOFspaceCharacters asciiLettersasciiUppercase)_ReparseException)_utilscCsg|]}|dqSasciiencode.0itemr/builddir/build/BUILDROOT/alt-python38-pip-20.2.4-1.el7.x86_64/opt/alt/python38/lib/python3.8/site-packages/pip/_vendor/html5lib/_inputstream.py srcCsg|]}|dqSrrrrrrrscCsg|]}|dqSrrrrrrrs>.)sumr"r$rrrr+XszBufferedStream._bufferedBytescCs<|j|}|j||jdd7<t||jd<|Sr&)r!r4r"appendr#r')r$r3datarrrr0[s   zBufferedStream._readStreamcCs|}g}|jd}|jd}|t|jkr|dkr|dks>t|j|}|t||krl|}|||g|_n"t||}|t|g|_|d7}|||||||8}d}q|r|||d|S)Nrr )r#r'r"r,r7r0join)r$r3remainingBytesrv bufferIndex bufferOffset bufferedData bytesToReadrrrr1bs&     zBufferedStream._readFromBufferN) __name__ __module__ __qualname____doc__r%r*r/r4r+r0r1rrrrr 3s  r cKst|tjs(t|tjjr.t|jtjr.d}n&t|drJt|dt }n t|t }|rdd|D}|rvt d|t |f|St |f|SdS)NFr4rcSsg|]}|dr|qS) _encoding)endswith)rxrrrrs z#HTMLInputStream..z3Cannot set an encoding with a unicode input, set %r) isinstancer HTTPResponserresponseaddbasefphasattrr4r TypeErrorHTMLUnicodeInputStreamHTMLBinaryInputStream)sourcekwargs isUnicode encodingsrrrHTMLInputStream}s       rUc@speZdZdZdZddZddZddZd d Zd d Z d dZ dddZ ddZ ddZ dddZddZdS)rOProvides a unicode stream of characters to the HTMLTokenizer. This class takes care of character encoding and removing or replacing incorrect byte-sequences and also provides column and line tracking. i(cCsZtjsd|_ntddkr$|j|_n|j|_dg|_tddf|_| ||_ | dS)Initialises the HTMLInputStream. HTMLInputStream(source, [encoding]) -> Normalized stream from source for use by html5lib. source can be either a file-object, local filename or a string. The optional encoding parameter must be a string that indicates the encoding. If specified, that encoding will be used, regardless of any BOM or later declaration (such as in a meta element) Nu􏿿r rutf-8certain) rsupports_lone_surrogatesreportCharacterErrorsr'characterErrorsUCS4characterErrorsUCS2newLineslookupEncoding charEncoding openStream dataStreamreset)r$rQrrrr%s   zHTMLUnicodeInputStream.__init__cCs.d|_d|_d|_g|_d|_d|_d|_dS)Nr)r) chunkSize chunkOffseterrors prevNumLines prevNumCols_bufferedCharacterr6rrrrcszHTMLUnicodeInputStream.resetcCst|dr|}nt|}|SzvProduces a file object from source. source can be either a file object, local filename or a string. r4)rMr r$rQr!rrrras z!HTMLUnicodeInputStream.openStreamcCsT|j}|dd|}|j|}|dd|}|dkr@|j|}n ||d}||fS)N rrr )r)countrhrfindri)r$r-r)nLines positionLine lastLinePospositionColumnrrr _positions   z HTMLUnicodeInputStream._positioncCs||j\}}|d|fS)z:Returns (line, col) of the current position in the stream.r )rtrf)r$linecolrrrr#szHTMLUnicodeInputStream.positioncCs6|j|jkr|stS|j}|j|}|d|_|S)zo Read one character from the stream or queue if available. Return EOF when EOF is reached. r )rfre readChunkr r))r$rfcharrrrrxs   zHTMLUnicodeInputStream.charNcCs|dkr|j}||j\|_|_d|_d|_d|_|j|}|j rX|j |}d|_ n|s`dSt |dkrt |d}|dksd|krdkrnn|d|_ |dd}|j r| || d d }| d d }||_t ||_d S) NrdrFr r iz rm T)_defaultChunkSizertrerhrir)rfrbr4rjr'ordr[replace)r$rer8lastvrrrrws0           z HTMLUnicodeInputStream.readChunkcCs(ttt|D]}|jdqdS)Ninvalid-codepoint)ranger'invalid_unicode_refindallrgr7)r$r8_rrrr\sz*HTMLUnicodeInputStream.characterErrorsUCS4cCsd}t|D]}|rqt|}|}t|||drrt|||d}|tkrl|j dd}q|dkr|dkr|t |dkr|j dqd}|j dqdS)NFrTrzir ) rfinditerr}groupstartrisSurrogatePairsurrogatePairToCodepointnon_bmp_invalid_codepointsrgr7r')r$r8skipmatch codepointr(char_valrrrr]#s"  z*HTMLUnicodeInputStream.characterErrorsUCS2Fc Cszt||f}Wnhtk rx|D]}t|dks$tq$ddd|D}|sZd|}td|}t||f<YnXg}||j|j }|dkr|j |j krqn0| }||j kr| |j|j |||_ q| |j|j d| s~qq~d|} | S)z Returns a string of characters from the stream up to but not including any character in 'characters' or EOF. 'characters' must be a container that supports the 'in' method and iteration over its characters. rdcSsg|]}dt|qS)z\x%02x)r})rcrrrrHsz5HTMLUnicodeInputStream.charsUntil..z^%sz[%s]+N)charsUntilRegExKeyErrorr}r,r:recompilerr)rfreendr7rw) r$ charactersoppositecharsrregexr<mrrrrr charsUntil:s0    z!HTMLUnicodeInputStream.charsUntilcCsT|tk rP|jdkr.||j|_|jd7_n"|jd8_|j|j|ksPtdSr&)r rfr)rer,)r$rxrrrungetis   zHTMLUnicodeInputStream.unget)N)F)rArBrCrDr|r%rcrartr#rxrwr\r]rrrrrrrOs   & /rOc@sLeZdZdZdddZddZd d Zdd d Zd dZddZ ddZ dS)rPrVN windows-1252TcCsn|||_t||jd|_d|_||_||_||_||_ ||_ | ||_ |j ddk sbt |dS)rWidrN)ra rawStreamrOr% numBytesMetanumBytesChardetoverride_encodingtransport_encodingsame_origin_parent_encodinglikely_encodingdefault_encodingdetermineEncodingr`r,rc)r$rQrrrrr useChardetrrrr%s  zHTMLBinaryInputStream.__init__cCs&|jdj|jd|_t|dS)Nrr~)r` codec_info streamreaderrrbrOrcr6rrrrcszHTMLBinaryInputStream.resetcCsLt|dr|}nt|}z||Wntk rFt|}YnX|Srk)rMrr/r* Exceptionr rlrrrras z HTMLBinaryInputStream.openStreamcCs|df}|ddk r|St|jdf}|ddk r:|St|jdf}|ddk rX|S|df}|ddk rt|St|jdf}|ddk r|djds|St|jdf}|ddk r|S|rpzddl m }Wnt k rYnXg}|}|j s<|j |j}t|tst|s&q<||||q|t|jd}|j d|dk rp|dfSt|jdf}|ddk r|StddfS)NrYr tentativezutf-16)UniversalDetectorencodingr) detectBOMr_rrdetectEncodingMetarname startswithr%pip._vendor.chardet.universaldetectorr ImportErrordonerr4rrHr3r,r7feedcloseresultr/r)r$chardetr`rbuffersdetectorr"rrrrrsR           z'HTMLBinaryInputStream.determineEncodingcCs|jddkstt|}|dkr&dS|jdkrFtd}|dk stnT||jdkrf|jddf|_n4|jd|df|_|td|jd|fdS)Nr rYutf-16beutf-16lerXrzEncoding changed from %s to %s)r`r,r_rrr/rcr)r$ newEncodingrrrchangeEncodings   z$HTMLBinaryInputStream.changeEncodingc Cstjdtjdtjdtjdtjdi}|jd}t|t s"rPc@seZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ e e e Z ddZe eZefddZddZddZddZdS) EncodingByteszString-like object with an associated position and various extra methods If the position is ever greater than the string length then an exception is raisedcCst|tstt||SN)rHr3r,__new__lowerr$valuerrrrFszEncodingBytes.__new__cCs d|_dS)Nr)rtrrrrr%JszEncodingBytes.__init__cCs|Srrr6rrr__iter__NszEncodingBytes.__iter__cCs>|jd}|_|t|kr"tn |dkr.t|||dS)Nr rrtr' StopIterationrNr$prrr__next__Qs  zEncodingBytes.__next__cCs|Sr)rr6rrrnextYszEncodingBytes.nextcCsB|j}|t|krtn |dkr$t|d|_}|||dSr&rrrrrprevious]s zEncodingBytes.previouscCs|jt|krt||_dSrrtr'r)r$r#rrr setPositionfszEncodingBytes.setPositioncCs*|jt|krt|jdkr"|jSdSdS)Nrrr6rrr getPositionks  zEncodingBytes.getPositioncCs||j|jdSNr )r#r6rrrgetCurrentByteuszEncodingBytes.getCurrentBytecCsH|j}|t|kr>|||d}||kr4||_|S|d7}q||_dS)zSkip past a list of charactersr Nr#r'rtr$rrrrrrrzs  zEncodingBytes.skipcCsH|j}|t|kr>|||d}||kr4||_|S|d7}q||_dSrrrrrr skipUntils  zEncodingBytes.skipUntilcCs(|||j}|r$|jt|7_|S)zLook for a sequence of bytes at the start of a string. If the bytes are found return True and advance the position to the byte after the match. Otherwise return False and leave the position alone)rr#r')r$r3r<rrr matchBytesszEncodingBytes.matchBytescCs>z |||jt|d|_Wntk r8tYnXdS)zLook for the next sequence of bytes matching a given sequence. If a match is found advance the position to the last byte of the matchr T)indexr#r'rt ValueErrorrr2rrrjumpTos   zEncodingBytes.jumpToN)rArBrCrDrr%rrrrrrpropertyr#r currentBytespaceCharactersBytesrrrrrrrrrBs      rc@sXeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ dS)rz?Mini parser for detecting character encoding from meta elementscCst||_d|_dS)z3string - the data to work on for encoding detectionN)rr8rr$r8rrrr%s zEncodingParser.__init__c Csd|jkrdSd|jfd|jfd|jfd|jfd|jfd|jff}|jD]}d}z|jdWntk rzYqYnX|D]D\}}|j|rz|}WqWqtk rd}YqYqXq|sHqqH|j S) Nsr8rr6rrrrszEncodingParser.handleCommentcCs|jjtkrdSd}d}|}|dkr,dS|ddkr\|ddk}|r|dk r||_dSq|ddkr|d}t|}|dk r||_dSq|ddkrtt|d}|}|dk rt|}|dk r|r||_dS|}qdS) NTFrs http-equivr s content-typecharsetscontent) r8rr getAttributerr_ContentAttrParserrparse)r$ hasPragmapendingEncodingattrtentativeEncodingcodec contentParserrrrrs8      zEncodingParser.handleMetacCs |dS)NF)handlePossibleTagr6rrrrsz%EncodingParser.handlePossibleStartTagcCst|j|dS)NT)rr8rr6rrrrs z#EncodingParser.handlePossibleEndTagcCsb|j}|jtkr(|r$||dS|t}|dkrD|n|}|dk r^|}qLdS)NTr)r8rasciiLettersBytesrrrspacesAngleBracketsr)r$endTagr8rrrrrrs    z EncodingParser.handlePossibleTagcCs |jdS)Nrrr6rrrrszEncodingParser.handleOthercCs|j}|ttdgB}|dks2t|dks2t|dkr>dSg}g}|dkrV|rVqnX|tkrj|}qnD|dkrd|dfS|tkr|| n|dkrdS||t |}qF|dkr| d|dfSt ||}|dkrJ|}t |}||kr"t |d|d|fS|tkr<|| q||qnJ|d krbd|dfS|tkr||| n|dkrdS||t |}|t krd|d|fS|tkr|| n|dkrdS||qdS) z_Return a name,value pair for the next attribute in the stream, if one is found, or None/Nr )rN=)rrr9)'"r) r8rr frozensetr'r,r:asciiUppercaseBytesr7rrrr)r$r8rattrName attrValue quoteCharrrrrsb             zEncodingParser.getAttributeN) rArBrCrDr%rrrrrrrrrrrrrs$rc@seZdZddZddZdS)rcCst|tst||_dSr)rHr3r,r8rrrrr%aszContentAttrParser.__init__cCsz|jd|jjd7_|j|jjdkss       JgIb='