按這成為會員 登錄
比思論壇 返回首頁

gaojunyangde的個人空間 http://e3-1230v2.bl-phx0.1.141.8.2.e5.securedservers.com/?928411 [收藏] [複製] [分享] [RSS]

日誌

Master huang's diary of entrepreneurship(Marck.31.2019)

已有 2559 次閱讀2019-3-31 22:47 |個人分類:创业日记

1.从今天开始,我想使用一种标准的、简单的、国际通用语言来记录我所有的工作,最好的选择应该就是英语。。。这也是现在所有的孩子都在学习英语的原因吧,呵呵。。。虽然刚一开始有些困难,但是慢慢的就能轻车熟路了。那句话怎么说来着"最容易的学习英语的方式就是每天使用它",现在就让我来试一试这种方法吧!!!加油了,黄师傅!!!(首先就来翻译这一段话,呵呵)

1.From now on, I want to use an standard simple international common language to record all of my work, the best choice should be English......and this could be reason why all of our students are learning that all day and night, hehe...... Although it's diffcult at first, but slowly I'm sure I would get used to it. There is a saying that goes "The easist way to learn English is to use it everyday and everynight". Now let me try this method!!!So come on, Dr. huang!!! (and firstly translate this passage,hehe)

2.ahahahaha......, I feel so good. And I'm sure it feels better than having sex(OOOO......, Even in English, I'm still very yellow and very violent).So now I can better understand what that buddy said. When you fight for your dream in your heart and It's very technical itself, you are the happiest person in the whole world(当你为了心中的梦想奋斗,这件事本身又很有技术含量的时候,你是这个世界上最幸福的人。。。)

3.So let's begin our great work......
    According to my current design, there are 4 dataflow in my logic, of course those're just a few large basic workflows, smaller ones should be more.I must research and elaborate those now because their quality is related to the quality of the overall process. And their content are as follows:
    (Automation can be based on Web-Principle-Perspective, and it can also be based on User-Perspective. In my logic there're 2 concepts: Network-Collector and Network-Publisher. So if we use the knowledge of permutation, 4 results can be obtained.)
    (1).A Web-Principle-Perspective based Network-Collector: Can be implemented directly using scrapy.
    (2).A Web-Principle-Perspective based Network-Publisher: Can also be implemented using scrapy, some code adjustments may be required, because Scrapy's source code would be confuser to implement the Network-Publisher.
    (3).A User-Perspective based Network-Collector: Scrapy and Selenium must be organically integrated, and the architecture of Scrapy needs to be redesigned. its must be a very diffcult, technical and challenging work, but but I'm also a person who likes to challenge myself, haha......
    (4).A User-Perspective based Network-Publisher: Ditto......

4.Let's start with Scrapy's architecture. Although Scrapy can smoothly implement the functions I want, Necessary code organization is also essential when writing crawler. We need to know that when programming is huge, a little logic problem can lead to hard-to-maintain consequences.
    Scrapy workflow as follows: (Articles on the official website)
        The data flow in Scrapy is contralled by the execution engine, and goes like this:
            <1>.The Engine gets the initial Requests to crawl from the Spider.
            <2>.The Engine schedules the Requests in the Scheduler and asks for the next Requests to crawl.
            <3>.The Scheduler returns the next Requests to the Engine.
            <4>.The Engine sends the Requests to the Downloader, passing through the DownloaderMiddlewares.
            <5>.Once the page finished downloading, the Downloader generates a Response with that page and sends it to the Engine, passing through the DownloadMiddlewares.
            <6>.The Engine receives the Response from the Downloader and sends it to the Spider for processing, passing through the SpiderMiddleware.
            <7>.The Spider processs the Response from the Engine and returns scraped items and new Requests to follow to the Engine, passing through the SpiderMiddleware.
            <8>.The Engine sends processed items to ItemPipeline, then sends processed new Requests to the Scheduler and asks for possible Reqests to crawl.
            <9>.The program repeats from step 1 until there are no more Requests from the Scheduler.

5.For the last 3 cases, I hope to find a common solution. I had a nap this afternoon, and think of an good idea for this......that I can write a new Downloader using Selenium and passing necessary parameter using HTTP's header field "METE"(I wonder if I was wrong,h ehe...),anyway there must be a field we can use to pass the information.
    After that I got online for a while and understand several Scrapy APIs on the official website. I found that I can well handle all of this and I also had a look at the source code of Scrapy, they are not so many, So I decided to read and understand all of those code before my programing......>et's begin!!!

評論 (0 個評論)

facelist doodle 塗鴉板

您需要登錄後才可以評論 登錄 | 按這成為會員

重要聲明:本論壇是以即時上載留言的方式運作,比思論壇對所有留言的真實性、完整性及立場等,不負任何法律責任。而一切留言之言論只代表留言者個人意見,並非本網站之立場,讀者及用戶不應信賴內容,並應自行判斷內容之真實性。於有關情形下,讀者及用戶應尋求專業意見(如涉及醫療、法律或投資等問題)。 由於本論壇受到「即時上載留言」運作方式所規限,故不能完全監察所有留言,若讀者及用戶發現有留言出現問題,請聯絡我們比思論壇有權刪除任何留言及拒絕任何人士上載留言 (刪除前或不會作事先警告及通知 ),同時亦有不刪除留言的權利,如有任何爭議,管理員擁有最終的詮釋權。用戶切勿撰寫粗言穢語、誹謗、渲染色情暴力或人身攻擊的言論,敬請自律。本網站保留一切法律權利。

手機版| 廣告聯繫

GMT+8, 2024-11-18 09:39 , Processed in 0.013947 second(s), 19 queries , Gzip On.

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回頂部