Javascript+php+curl
JavaScript就是js, 多年之后, 发现JavaScript已经不是从前, 这只证明了一件事, 这货不够优秀, 至少优秀程度不如C, 正则, markdown, git, wiki, lisp…….
( function ($参数) {
//这里是代码
} )(实际传进去的参数);//也就是说定义一个匿名函数, 然后立马执行他.
jquery
为啥要弄这么原始的东西呢? 因为, 我在研究马蜂窝, 他们用这个东西加载页面内容, 研究发现1.12的源码是带注释的. 奶奶的, 新的源码没有注释了, 貌似.
jQuery.ready.promise = function( obj ) {
if ( !readyList ) {
readyList = jQuery.Deferred();
// Catch cases where $(document).ready() is called
// after the browser event has already occurred.
// we once tried to use readyState "interactive" here,
// but it caused issues like the one
// discovered by ChrisS here:
// http://bugs.jquery.com/ticket/12282#comment:15
if ( document.readyState === "complete" ) {// Handle it asynchronously to allow scripts the opportunity to delay ready
window.setTimeout( jQuery.ready );
// Standards-based browsers support DOMContentLoaded
} else if ( document.addEventListener ) {
// Use the handy event callback
document.addEventListener( "DOMContentLoaded", completed );
// A fallback to window.onload, that will always work
window.addEventListener( "load", completed );
// If IE event model is used
} else {// Ensure firing before onload, maybe late but safe also for iframes
document.attachEvent( "onreadystatechange", completed );
// A fallback to window.onload, that will always work
window.attachEvent( "onload", completed );
// If IE and not a frame
// continually check to see if the document is ready
var top = false;
try {
top = window.frameElement == null && document.documentElement;
} catch ( e ) {}
if ( top && top.doScroll ) {
( function doScrollCheck() {
if ( !jQuery.isReady ) {
try {
// Use the trick by Diego Perini
// http://javascript.nwbox.com/IEContentLoaded/
top.doScroll( "left" );
} catch ( e ) {
return window.setTimeout( doScrollCheck, 50 );
}
// detach all dom ready events
detach();
// and execute any waiting functions
jQuery.ready();
}
} )();
}
}
}
return readyList.promise( obj );
};
参考:
- http://sunnylost.com/article/jquery.ready.html
- http://www.cnblogs.com/giggle/p/5246798.html
- http://rapheal.sinaapp.com/2013/01/30/jquery-src-ready/
- http://rapheal.sinaapp.com/2013/01/17/jquery-src-util/
- 最后两篇是一个系列, 很不错, 剖析jquery, 等我弄js的时候, 要去研究.
chrome分析
- 用inspect 定位页面元素. 发现元素的id或者class.
- 全局搜索id和class定位到JavaScript.
- 发现加载的ajax文件.
- 到network查看router.php, 下面可以看到各种header和form data. (这些参数是有用的, 要copy到postman)
- 可以看到下面这些JavaScript了.
<script type="text/javascript"> //代码质量真心不咋地.
(function ($) {var tagId, page = 1;
$('.nav li').click(function () {
var $this = $(this);
if(!$this.hasClass('on')){
tagId = $this.data('tagid');
page = 1;
getPage($this); }}).eq(0).click();
function getPage($element) {$.ajax({
url: '/ajax/router.php', dataType: "json",
data: {
'sAct': 'KMdd_StructWebAjax|GetPoisByTag',
'iMddid': 10183,
'iTagId': tagId || 0,
'iPage': page
}, type: 'post',
success: function (ret) {
if (ret.succ) {
$element && $element.addClass('on').siblings().removeClass('on');
$('.row-allScenic .scenic-list').html(ret.data.list);
$('.row-allScenic ._j_tn_pagination').html(ret.data.page).find( 'a[data-page]').click(function () { page = $(this).data('page');
getPage();});}}});}})(jQuery);
chrome分析补遗
- source这里可以设置各种断点
- JavaScript异常.
- 代码的某一行. 还可以设置条件断点.“Edit breakpoint”, 比如for循环中的断点, 我们只想看到某些情况的执行.
- 某种类的事件, 比如鼠标单击
- 主页面一般就是第一个source.
- source中的代码都是可以直接修改的.
- console可以输出日志.
- console可以运行脚本.
- elements可以调样式.
- break on可以监听. 貌似这个地方设置断点更合理.
- workspace可以直接改, 直接看效果, 直接保存源码.
chrome调试的参考
- http://wiki.jikexueyuan.com/project/chrome-devtools/debugging-javascript.html
- http://han.guokai.blog.163.com/blog/static/136718271201321402514114/
- https://segmentfault.com/a/1190000000431586
- http://colinued.leanote.com/post/%E5%89%8D%E7%AB%AF%E5%BC%80%E5%8F%91%E7%A5%9E%E4%B8%80%E6%A0%B7%E7%9A%84%E5%B7%A5%E5%85%B7chrome%E8%B0%83%E8%AF%95%E6%8A%80%E5%B7%A7
- https://www.zhihu.com/question/20260762
chrome可以
- ctrl+o 打开一个js文件
- ctrl+p 同ctrl+o
- ctrl+f 查找当前js文件中的关键字
- ctrl+shift+f 全局查找关键字
- ctrl+shift+e 在控制台运行当前选中的代码片段
postman
-
headers 里面只需设置cookie
-
body里面四个参数, 这些都填充上面chrome开发工具看到的那些.
-
postman报错还是比较准确的, 他说哪个参数错了, 基本上就是那个参数错了.
-
参数值不需要转义转码.
-
使用预执行script(pre-request)
document.cookie="PHPSESSID=t8efu16md1t6m1cir2o9l68df7; mfw_uuid=5840e43c-5e80-5c9e-05d0-92f90856c93c; __mfwurd=a%3A3%3A%7Bs%3A6%3A%22f_time%22%3Bi%3A1480647746%3Bs%3A9%3A%22f_rdomain%22%3Bs%3A0%3A%22%22%3Bs%3A6%3A%22f_host%22%3Bs%3A3%3A%22www%22%3B%7D; __mfwuuid=5840e43c-5e80-5c9e-05d0-92f90856c93c; oad_n=a%3A3%3A%7Bs%3A3%3A%22oid%22%3Bi%3A1029%3Bs%3A2%3A%22dm%22%3Bs%3A15%3A%22www.mafengwo.cn%22%3Bs%3A2%3A%22ft%22%3Bs%3A19%3A%222016-12-15+17%3A31%3A57%22%3B%7D; __mfwlv=1482201913; __mfwvn=8; CNZZDATA30065558=cnzz_eid%3D612852601-1480644536-%26ntime%3D1482200023; uva=a%3A4%3A%7Bs%3A2%3A%22lt%22%3Bi%3A1482203297%3Bs%3A10%3A%22last_refer%22%3Bs%3A6%3A%22direct%22%3Bs%3A5%3A%22rhost%22%3Bs%3A0%3A%22%22%3Bs%3A4%3A%22step%22%3Bi%3A17%3B%7D; __mfwlt=1482203307"; //然后, postman就报错了. 他不能执行document. 神奇了.
采集
- 不能再使用DOMDocument->LoadHTMLFile
- 尝试下file/file_get_content/
- 前面的判断貌似错了, 貌似context就是设置这个的,
- 和curl的抉择有点费心思.
回顾
- 使用file/file_get_content/可以配合正则perl regex
- 使用DOMDocument->LoadHTMLFile配合xpath
- 使用curl配合xpath
对比
// using file_get_contents to submit a support ticket
$post_array = array (
"email" => "someone@gmail.com",
"problem" => "error happened on baes...bla bla bla"
);
$post_string = http_build_query($post_array);
$opts = array( //参考: http://php.net/manual/zh/wrappers.php
'http' => array( //参考: http://php.net/manual/zh/context.http.php
'method' => "POST",
'header' => "Content-Type: application/x-www-form-urlencoded",
'content'=> $post_string
)
);
// http://php.net/manual/zh/function.stream-context-set-option.php
// http://php.net/manual/zh/function.stream-context-create.php
$context = stream_context_create($opts);
// http://php.net/manual/zh/function.file-get-contents.php
// https://doc.phpspider.org/development_skills/file_get_contents-proxy.html
$result = file_get_contents('http://www.cubebackup.com/ticketsubmit.php', false, $context);
if ($result === FALSE) //...
// using CURL lib to do the same job
$post_array = array (
"email" => "someone@gmail.com",
"problem" => "error happened on bacs...bla bla bla"
);
$post_data = http_build_query($post_array);//这句未必需要.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.cubebackup.com/ticketsubmit.php');
// in real life you should use something like:
// curl_setopt($ch, CURLOPT_POSTFIELDS,
// http_build_query(array('postvar1' => 'value1')));
// receive server response ...
//curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
//curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$res = curl_exec($ch);
curl_close($ch);
if ($res === FALSE) //...
////////////////////////////////////
$curl = new Curl\Curl();
$curl->put('http://api.example.com/user/', array(
'first_name' => 'Zach',
'last_name' => 'Borboa',
));
//PECL_HTTP implemention
$problemDesc="error happend on bacs... bla bla bla";
$post_string = "email=someone@gmail.com&problem=".$problemDesc;
$res = http_post_data('http://www.cubebackup.com/ticketsubmit.php', $post_string);
if ($res === FALSE) { // ... }
对比总结
- curl依旧是最好的. 类似正则
- 如果简单可以考虑, 类似各种字符串函数, 但是, 稍微复杂一点点比如post, 那么这个方法就要求我们记住很多常量和content结构. 而且他还比较快. 因为做了很多优化, 比如缓存dns.
- pecl扩展明显更优雅. 话说正则的扩展也是pecl, 其实就是在curl外边包了一层. 据说这个扩展需要安装, 并且安装后还要编译php源码…….
看考
- 关于性能: http://www.programering.com/a/MzMxETNwATI.html
- 性能原文: https://mdb9.wordpress.com/2011/03/06/file_get_contents-vs-curl-what-has-better-performance/
- 入门参考: https://www.codeproject.com/tips/1019822/comparison-of-the-http-libs-in-php-file-get-conten
- 简单封装: https://www.fusionswift.com/2010/02/curl-vs-file_get_contents/
简单封装
<?php
function curl_get_contents($url) {
// Initiate the curl session
$ch = curl_init();
// Set the URL
curl_setopt($ch, CURLOPT_URL, $url);// Removes the headers from the output
curl_setopt($ch, CURLOPT_HEADER, 0);// Return the output instead of displaying it directly
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);// Execute the curl session
$output = curl_exec($ch);
// Close the curl session
curl_close($ch);
// Return the output as a variable
return $output;
}
?>
优化curl
观察发现init和close都比较费时间, 是否能一次次init, 多次使用呢?
stackoverflow说可以的
$ch = curl_init();
while(true){
这里搞exec
}
curl_close($ch);
扒取的参数
1. 图片.
var mddid = 172703, poiid = 7518;
2. 页面.
data: {
'sAct' : 'KMdd_StructWebAjax|GetPoisByTag',
'iMddid' : 10183,
'iTagId' : tagId || 0,
'iPage' : page
},