A bit of PHP help?

EsOne

Beta member
Messages
4
Ok. I am trying to do a basic scrape and echo results via PHP.
I have the information I am trying to scrape, and I think I am using the correct code, but I am not sure, since it is not showing the array of the results when I try it.

Here is a section of the page source I am trying to scrape

Code:
noire","user_id":"3237825","show_in_sig":"1","show_in_profile":1,"last_engine_run":"1260190557","tap_count":"11984","view_count":"10951","total_gold_won":2119743,"env_health":"32166","env_bg_id":null,"env_last_grant_time":"1260190557","inhab_retire":false,"game_info":{"1":{"type":1,"instance_id":"1190571832.1260284818.446690843","open_time":1260292832,"close_time":1260293672,"end_time":1260293732,"length":60,"results_time":1260293742,"state":"open","player_count":6}},"events":

The part I am trying to scrape is the "state":"open"

Here is the code I am using to do this...

Code:
<?php
$data = file_get_contents('http://www.gaiaonline.com/chat/gsi/index.php?v=json&m=[[6500%2C[1]]%2C[6510%2C[%22789151%22%2C0%2C1]]%2C[6511%2C[%22789151%22%2C0]]%2C[6512%2C[%22789151%22%2C0]]%2C[107%2C[%22null%22]]]&X=1260293122');
$regex = '/"state":"(.+?)","player_count"/';
preg_match($regex,$data,$match);
var_dump($match);
echo $match;
?>

Result is coming back:
1. array(0) { } Array

Completely not showing the results.

I tried scraping another section using the above code, and it did work, but the section that did work did not have any " around it. I am VERY new at PHP, so I am figuring it is something to do with my $regex, and the whole " in the results I am looking for.
 
you might find that the quotes are literally just quotes,
so your $regex string might need to be
Code:
'/"state":"(.+?)",":player_count"/';

of course, just to be sure you could use the htmlspecialchars function to encode the text that you are getting into the format that you think that you're getting...

but I'd recommend that you take the page source that you think that you are getting, (i.e with valid HTML, not just quotes).

you then use the http://uk2.php.net/manual/en/function.htmlspecialchars-decode.php html special chars decode function to remove all the HTML to plain text...

then search for the plain text... (as shown in the code block that I'd changed above for your $regex string
 
I'm tired so might be off, but is your regex right? Agree with what root has said, so looking at the regex alone:
Code:
/"state":"(.+?)",":player_count"/
To start with, why the slashes at the beginning and end? And why the (.+?) in the middle?

I'm not sure about escaping characters and suchlike specifically for php, but the regex I'd use would simply be:
Code:
("state":"[^"]+")

Or does that not cover all the possibilities?
 
Back
Top Bottom