U n a+T@sdZddlZddlZddlZddlZddlZddlZddlZddlm Z ddl m Z m Z ddl mZddlmZmZmZddlmZddlmZdd lmZdd lmZdd lmZmZdd lm Z dd l!m"Z"m#Z#ddl$m%Z%m&Z&e rtddl'm(Z(m)Z)m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1ddl2Z3ddl4m5Z5ddl6m7Z7ddl8m9Z9e3j:j;je>fZ?e0dZ@Gddde-ZAeBeCZDdNddZEeFedeEZGddZHddZIGdddeJZKdd ZLGd!d"d"eJZMd#d$ZNd%d&ZOd'd(ZPd)d*ZQd+d,ZRd-d.ZSeTd/ejUZVd0d1ZWd2d3ZXd4d5ZYGd6d7d7eZZ[d8d9Z\e\d:d;Z]Gdd?Z_dPdAdBZ`dQdCdDZadEdFZbdRdHdIZcGdJdKdKeZZdGdLdMdMeZZedS)SzM The main purpose of this module is to expose LinkCollector.collect_links(). N) OrderedDict)html5librequests)unescape) HTTPError RetryErrorSSLError)parse)requestLink)ARCHIVE_EXTENSIONS)pairwiseredact_auth_from_url)MYPY_CHECK_RUNNING) path_to_url url_to_path)is_urlvcs) CallableIterableListMutableMappingOptionalProtocolSequenceTupleTypeVarUnion)Response) SearchScope) PipSessionFc@seZdZdddZdS)LruCacheNcCstdSN)NotImplementedError)selfmaxsizer(^C:\Users\vtejo\AppData\Local\Temp\pip-unpacked-wheel-6mt8ur68\pip\_internal\index\collector.py__call__.szLruCache.__call__)N)__name__ __module__ __qualname__r*r(r(r(r)r#-sr#cCs dd}|S)NcSs|Sr$r()fr(r(r)_wrapper:sz noop_lru_cache.._wrapperr()r'r/r(r(r)noop_lru_cache8sr0 lru_cachecCs6tjD]*}||r|t|dkr|SqdS)zgLook for VCS schemes in the URL. Returns the matched VCS scheme, or None if there's no match. z+:N)rschemeslower startswithlen)urlschemer(r(r)_match_vcs_schemeCs  r8cCs(t|j}tD]}||rdSqdS)z2Return whether the URL looks like an archive. TF)r filenamer endswith)r6r9bad_extr(r(r)_is_url_like_archiveOs   r<cseZdZfddZZS)_NotHTMLcs"tt|||||_||_dSr$)superr=__init__ content_type request_desc)r&r@rA __class__r(r)r?[sz_NotHTML.__init__)r+r,r-r? __classcell__r(r(rBr)r=Zsr=cCs.|jdd}|ds*t||jjdS)zCheck the Content-Type header to ensure the response contains HTML. Raises `_NotHTML` if the content type is not text/html. Content-Type text/htmlN)headersgetr3r4r=r method)responser@r(r(r)_ensure_html_headerbsrLc@s eZdZdS)_NotHTTPN)r+r,r-r(r(r(r)rMmsrMcCsDt|\}}}}}|dkr"t|j|dd}|t|dS)zSend a HEAD request to the URL, and ensure the response contains HTML. Raises `_NotHTTP` if the URL is not available for a HEAD request, or `_NotHTML` if the content type is not text/html. >httphttpsT)allow_redirectsN) urllib_parseurlsplitrMheadraise_for_statusrL)r6sessionr7netlocpathqueryfragmentrespr(r(r)_ensure_html_responseqs r[cCsLt|rt||dtdt||j|dddd}|t||S)aAccess an HTML page with GET, and return the response. This consists of three parts: 1. If the URL looks suspiciously like an archive, send a HEAD first to check the Content-Type is HTML, to avoid downloading a large file. Raise `_NotHTTP` if the content type cannot be determined, or `_NotHTML` if it is not HTML. 2. Actually perform the request. Raise HTTP exceptions on network failures. 3. Check the Content-Type header to make sure we got HTML, and raise `_NotHTML` otherwise. rUzGetting page %srGz max-age=0)Acceptz Cache-Control)rH)r<r[loggerdebugrrIrTrL)r6rUrZr(r(r)_get_html_responses r`cCs2|r.d|kr.t|d\}}d|kr.|dSdS)zBDetermine if we have any encoding information in our headers. rEcharsetN)cgi parse_header)rHr@paramsr(r(r)_get_encoding_from_headerss  recCs.|dD]}|d}|dk r |Sq |S)aDetermine the HTML document's base URL. This looks for a ```` tag in the HTML document. If present, its href attribute denotes the base URL of anchor tags in the document. If there is no such tag (or if it does not have a valid href attribute), the HTML file's URL is used as the base URL. :param document: An HTML document representation. The current implementation expects the result of ``html5lib.parse()``. :param page_url: The URL of the HTML document. z.//basehrefN)findallrI)documentpage_urlbaserfr(r(r)_determine_base_urls   rkcCstt|S)zP Clean a "part" of a URL path (i.e. after splitting on "@" characters). )rQquoteunquotepartr(r(r)_clean_url_path_partsrpcCstt|S)z Clean the first part of a URL path that corresponds to a local filesystem path (i.e. the first part after splitting on "@" characters). )urllib_request pathname2url url2pathnamernr(r(r)_clean_file_url_paths rtz(@|%2F)cCs^|r t}nt}t|}g}tt|dgD]$\}}|||||q.d |S)z* Clean the path portion of a URL. rF) rtrp_reserved_chars_resplitr itertoolschainappendupperjoin)rW is_local_path clean_funcparts cleaned_partsto_cleanreservedr(r(r)_clean_url_paths rcCs2t|}|j }t|j|d}t|j|dS)z Make sure a link is fully quoted. For example, if ' ' occurs in the URL, it will be replaced with "%20", and without double-quoting other characters. )r|)rW)rQurlparserVrrW urlunparse_replace)r6resultr|rWr(r(r) _clean_links rcCsf|d}|sdStt||}|d}|r8t|nd}|d}|rRt|}t||||d}|S)zJ Convert an anchor element in a simple repository page to a Link. rfNzdata-requires-pythonz data-yanked) comes_fromrequires_python yanked_reason)rIrrQurljoinrr )anchorribase_urlrfr6 pyrequirerlinkr(r(r)_create_link_from_elements   rc@s$eZdZddZddZddZdS)CacheablePageContentcCs|js t||_dSr$)cache_link_parsingAssertionErrorpage)r&rr(r(r)r?7s zCacheablePageContent.__init__cCst|t|o|jj|jjkSr$) isinstancetyperr6)r&otherr(r(r)__eq__<szCacheablePageContent.__eq__cCs t|jjSr$)hashrr6r&r(r(r)__hash__AszCacheablePageContent.__hash__N)r+r,r-r?rrr(r(r(r)r6srcs2tddfddtfdd}|S)z Given a function that parses an Iterable[Link] from an HTMLPage, cache the function's result (keyed by CacheablePageContent), unless the HTMLPage `page` has `page.cache_link_parsing == False`. N)r'cst|jSr$)listr)cacheable_page)fnr(r)wrapperPsz'with_cached_html_pages..wrappercs|jrt|St|Sr$)rrr)rrrr(r)wrapper_wrapperUs z/with_cached_html_pages..wrapper_wrapper) _lru_cache functoolswraps)rrr(rr)with_cached_html_pagesFs rccsVtj|j|jdd}|j}t||}|dD]"}t|||d}|dkrJq.|Vq.dS)zP Parse an HTML document, and yield its anchor elements as Link objects. F)transport_encodingnamespaceHTMLElementsz.//a)rirN)rr contentencodingr6rkrgr)rrhr6rrrr(r(r) parse_links_s  rc@s"eZdZdZdddZddZdS) HTMLPagez'Represents one page, along with its URLTcCs||_||_||_||_dS)am :param encoding: the encoding to decode the given content. :param url: the URL from which the HTML was downloaded. :param cache_link_parsing: whether links parsed from this page's url should be cached. PyPI index urls should have this set to False, for example. N)rrr6r)r&rrr6rr(r(r)r?{szHTMLPage.__init__cCs t|jSr$)rr6rr(r(r)__str__szHTMLPage.__str__N)T)r+r,r-__doc__r?rr(r(r(r)rxs rcCs|dkrtj}|d||dS)Nz%Could not fetch URL %s: %s - skipping)r^r_)rreasonmethr(r(r)_handle_get_page_failsrTcCst|j}t|j||j|dS)N)rr6r)rerHrrr6)rKrrr(r(r)_make_html_pages rc Cs|dkrtd|jddd}t|}|r@td||dSt|\}}}}}}|dkrtj t |r| ds|d7}t|d}td |zt||d }WnFtk rtd |Yn4tk r}ztd ||j|jW5d}~XYntk r0}zt||W5d}~XYntk r\}zt||W5d}~XYntk r}z$d } | t|7} t|| tjdW5d}~XYndtjk r}zt|d|W5d}~XYn0tjk rt|dYnXt||jdSdS)Nz?_get_html_page() missing 1 required keyword argument: 'session'#rzCannot look at %s URL %sfile/z index.htmlz# file: URL is directory, getting %sr\zQSkipping page %s because it looks like an archive, and cannot be checked by HEAD.z.sort_pathzfile:z)Path '{0}' is ignored: it is a directory.z:Url '%s' is ignored: it is neither a file nor a directory.zQUrl '%s' is ignored. It is either a non-existing path or lacks a specific scheme.)rrWexistsr4rrrealpathlistdirr{ryr^warningrisfiler) locations expand_dirrr6r| is_file_urlrWitemr(rr)group_locationssF        rc@seZdZdZddZdS)CollectedLinksa Encapsulates the return value of a call to LinkCollector.collect_links(). The return value includes both URLs to project pages containing package links, as well as individual package Link objects collected from other sources. This info is stored separately as: (1) links from the configured file locations, (2) links from the configured find_links, and (3) urls to HTML project pages, as described by the PEP 503 simple repository API. cCs||_||_||_dS)z :param files: Links from file locations. :param find_links: Links from find_links. :param project_urls: URLs to HTML project pages, as described by the PEP 503 simple repository API. Nr find_links project_urls)r&rrrr(r(r)r?6s zCollectedLinks.__init__N)r+r,r-rr?r(r(r(r)r%src@s4eZdZdZddZeddZddZdd Zd S) LinkCollectorz Responsible for collecting Link objects from all configured locations, making network requests as needed. The class's main method is its collect_links() method. cCs||_||_dSr$) search_scoperU)r&rUrr(r(r)r?QszLinkCollector.__init__cCs|jjSr$)rrrr(r(r)rZszLinkCollector.find_linkscCst||jdS)z> Fetch an HTML page containing package links. r\)rrU)r&locationr(r(r) fetch_page_szLinkCollector.fetch_pagec sj}||}t|\}}tjdd\}}ddt||D}ddjD} fddtdd|Dd d|DD} t| } d t| |g} | D]} | d | qt d | t || | d S)zFind all available links for the given project name. :return: All the Link objects (unfiltered), as a CollectedLinks object. T)rcSsg|] }t|qSr(r .0r6r(r(r) ssz/LinkCollector.collect_links..cSsg|]}t|dqS)z-fr rr(r(r)rxscsg|]}j|r|qSr()rUis_secure_origin)rrrr(r)r}s css|]}t|ddVqdS)FrNr rr(r(r) sz.LinkCollector.collect_links..css|]}t|VqdSr$r rr(r(r)rsz,{} location(s) to search for versions of {}:z* {} r)rget_index_urls_locationsrrrwrxrrr5ryr^r_r{r) r& project_namerindex_locationsindex_file_loc index_url_loc fl_file_loc fl_url_loc file_linksfind_link_links url_locationslinesrr(rr) collect_linksfs>        zLinkCollector.collect_linksN) r+r,r-rr?propertyrrrr(r(r(r)rHs   r)N)N)T)N)F)frrbrrwloggingrrre collectionsr pip._vendorrrpip._vendor.distlib.compatrZpip._vendor.requests.exceptionsrrrZpip._vendor.six.moves.urllibr rQr rqpip._internal.models.linkr pip._internal.utils.filetypesr pip._internal.utils.miscrrpip._internal.utils.typingrpip._internal.utils.urlsrrpip._internal.vcsrrtypingrrrrrrrrrrxml.etree.ElementTreexmlZpip._vendor.requestsr!pip._internal.models.search_scoper pip._internal.network.sessionr!etree ElementTreeElementZ HTMLElementrZResponseHeadersr"r# getLoggerr+r^r0getattrrr8r< Exceptionr=rLrMr[r`rerkrprtcompile IGNORECASErurrrobjectrrrrrrrrrrrr(r(r(r)sx       0           3     7 <#