900字范文,内容丰富有趣,生活中的好帮手!
900字范文 > php 如何精准获取网站中的所有超链接?

php 如何精准获取网站中的所有超链接?

时间:2022-01-20 12:07:53

相关推荐

php 如何精准获取网站中的所有超链接?

后端开发|php教程

php

后端开发-php教程

想获取网站中的所有超链接,使用的是php snoopy类

爱情树 源码,修改vscode骨架,ubuntu nrpe,tomcat运行ajax,sqlite3 网络位置,wordpress插件手动升级,前端多角色控制页面显示框架,收集网站数据的爬虫代码,评分系统 php,宁德seo推广费用,小企网站建设解决方案,易语言批量网页填表,网店app模板免费下载lzw

$sourceURL = $url;$snoopy->fetchlinks($sourceURL);$content = $snoopy->results;

获取的结果如下:

jbpm源码,ubuntu打开stl文件,周公解梦满地爬虫,php里 $,seo 营销策略lzw

array (size=627) 0 => string // (length=49) 1 => string http://sh.?tracelog=nav_ma (length=41) 2 => string /feedback/default.htm?routeto=inbox&tracelog=nav_ma_mc (length=80) 3 => string //hz-/favorite/favorite_home.htm?tracelog=nav_ma_fav (length=94) 4 => string /form.htm?tracelog=header_myalibaba (length=57) 5 => string http://hz./rfq/request/rfq_manage_list.htm?tracelog=nav_ma_mana_rfq (length=87) 6 => string /generalorders/list_orders.htm?tracelog=ma_mana_orders (length=76) 7 => string http://sh./product/post_product_interface.htm?tracelog=newschp_nav_madp (length=86) 8 => string http://sh./product/manage_products.htm?tracelog=newschp_nav_mamng (length=80) 9 => string http://hz./rfq/quotation/rfq_not_quoted_manage_list.htm?nav_ma_rec_rfqs (length=91) 10 => string /javascript:; (length=35) 11 => string /Products?tracelog=beacon_cate_140704 (length=59) 12 => string /form.htm?tracelog=header_forbuyers (length=57) 13 => string ?tracelog=beacon_expo_150820 (length=57) 14 => string ?tracelog=nav_ws (length=44) 15 => string /bizid_buyer?tracelog=nav_bi (length=52) 16 => string /bao/buyer_advertise.htm?tracelog=from_home_menu (length=81) 17 => string /alibaba/secure-payment.php?tracelog=beacon_payment_150114 (length=87) 18 => string /ecl/buyer.htm?tracelog=beacon_credit_140704 (length=70) 19 => string /?tracelog=beacon_is_140704 (length=56) 20 => string /intelligence?tracelog=beacon_ti_140704 (length=63) 21 => string /forum?tracelog=beacon_df_140704 (length=56) 22 => string /?tracelog=beacon_ta_140704 (length=49) 23 => string /javascript:; (length=35) 24 => string /memberships/index.html?tracelog=seller_channel_member_hp_header (length=89) 25 => string /learningcenter?tracelog=seller_channel_lc_hp_header (length=77) 26 => string /training.htm?tracelog=seller_channel_training_hp_header (length=81) 27 => string /?tracelog=newschp_nav_narfq (length=55) 28 => string /javascript:; (length=35)

怎么能把“/javascript:;”类似的URL去掉?

项目协作 源码,mbp修复ubuntu引导,网络爬虫简单实现,PHP评卷,达州seo顾问lzw

回复内容:

想获取网站中的所有超链接,使用的是php snoopy类

$sourceURL = $url;$snoopy->fetchlinks($sourceURL);$content = $snoopy->results;

获取的结果如下:

array (size=627) 0 => string // (length=49) 1 => string http://sh.?tracelog=nav_ma (length=41) 2 => string /feedback/default.htm?routeto=inbox&tracelog=nav_ma_mc (length=80) 3 => string //hz-/favorite/favorite_home.htm?tracelog=nav_ma_fav (length=94) 4 => string /form.htm?tracelog=header_myalibaba (length=57) 5 => string http://hz./rfq/request/rfq_manage_list.htm?tracelog=nav_ma_mana_rfq (length=87) 6 => string /generalorders/list_orders.htm?tracelog=ma_mana_orders (length=76) 7 => string http://sh./product/post_product_interface.htm?tracelog=newschp_nav_madp (length=86) 8 => string http://sh./product/manage_products.htm?tracelog=newschp_nav_mamng (length=80) 9 => string http://hz./rfq/quotation/rfq_not_quoted_manage_list.htm?nav_ma_rec_rfqs (length=91) 10 => string /javascript:; (length=35) 11 => string /Products?tracelog=beacon_cate_140704 (length=59) 12 => string /form.htm?tracelog=header_forbuyers (length=57) 13 => string ?tracelog=beacon_expo_150820 (length=57) 14 => string ?tracelog=nav_ws (length=44) 15 => string /bizid_buyer?tracelog=nav_bi (length=52) 16 => string /bao/buyer_advertise.htm?tracelog=from_home_menu (length=81) 17 => string /alibaba/secure-payment.php?tracelog=beacon_payment_150114 (length=87) 18 => string /ecl/buyer.htm?tracelog=beacon_credit_140704 (length=70) 19 => string /?tracelog=beacon_is_140704 (length=56) 20 => string /intelligence?tracelog=beacon_ti_140704 (length=63) 21 => string /forum?tracelog=beacon_df_140704 (length=56) 22 => string /?tracelog=beacon_ta_140704 (length=49) 23 => string /javascript:; (length=35) 24 => string /memberships/index.html?tracelog=seller_channel_member_hp_header (length=89) 25 => string /learningcenter?tracelog=seller_channel_lc_hp_header (length=77) 26 => string /training.htm?tracelog=seller_channel_training_hp_header (length=81) 27 => string /?tracelog=newschp_nav_narfq (length=55) 28 => string /javascript:; (length=35)

怎么能把“/javascript:;”类似的URL去掉?

QueryList

[img,src]])->data;//打印结果print_r($data);//采集某页面所有的超链接$data = QueryList::Query(/google/list_1.html,[link => [a,href]])->data;//打印结果print_r($data);

/jae/QueryList

可以看下这个,比snoopy要强大一些,支持jquery选择器语法

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。