The downer
Today is all about an hack. I've got a pet project, I believe I already talked about it, its a library that scraps among several sites and extracts from there relevant information. So whats new? Well I was doing the usual scraping stuff and notice that one http header that I had set on the http request didn't appear to be set by the API, which in this case is JSoup. Something was wrong, and at first I thought the problem was something on my code
Connection con = DomUtils.get(RODONORTE_URL+"pt");
Connection.Response r = con.execute();
con = DomUtils.get(RODONORTE_URL+"plugins/ximyticket/xiMyticket.ajax.php");
con.ignoreContentType(true);
con.header("Origin","http://www.rodonorte.pt");
con.header("X-Requested-With","XMLHttpRequest");
con.header("Referer","http://www.rodonorte.pt/pt/");
con.data("method","GetStopsRest","isRest","true");
I launched the famous swiss army knife, also known as wireshark (I love the message go deep in the page, I always end up thinking bad stuff), to help me debug the http connection. I was surprised when the captured packets showed me what I was suspecting
POST /plugins/ximyticket/xiMyticket.ajax.php HTTP/1.1
Accept-Encoding: gzip
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/43.0.2357.81 Chrome/43.0.2357.81 Safari/537.36
X-Requested-With: XMLHttpRequest
Referer: http://www.rodonorte.pt/pt/
Host: www.rodonorte.pt
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-type: application/x-www-form-urlencoded
Content-Length: 31
method=GetStopsRest&isRest=true
the Origin header is missing from my list of headers.
Searching for answers
All the others appear to be being set. Strange behavior indeed. So I start searching on friend google if someone had something similar. Here I found an answer.
The author, cleverly, decided to forbid Origin and other CORS-related headers without actually implementing the CORS spec. It's pretty depressing.
Well, depressing indeed. So at that moment I start thinking to implement my own low level Http client. I already did it in the past and I remember something. It takes lots of time. I didn't want to go to the same process again. It just wouldn't worth the effort. At the same time I start digging in the code to understand what was really happening here regarding the headers, and how these were being processed. The first thing I noticed was that JSoup uses HttpURLConnection to make all the http calls, inside the execute method that is responsible for all the http calling we notice this
HttpURLConnection conn = createConnection(req);
so the next step is trivial, yeah just look at this class code and find what is happening.
I found that the signature is a abstract class which means that I need to search for the subclass that extends this and that is being used by JSoup.
abstract public class HttpURLConnection extends URLConnection {
Since the abstract class HttpUrlConnection has lots of subclasses I found more useful running the code and find at runtime which subclass I need to look at. After some minutes I found that the implementation runs with the same name and is located in the sun.net.www.protocol.http package.
Looking at this code I figure it two interesting lines
private static final Set<String> restrictedHeaderSet;
private static final String[] restrictedHeaders = new String[]{
"Access-Control-Request-Headers", "Access-Control-Request-Method",
"Connection",
"Content-Length",
"Content-Transfer-Encoding",
"Host",
"Keep-Alive",
"Origin",
"Trailer",
"Transfer-Encoding",
"Upgrade",
"Via"
};
And so it was, as expected, the Origin header field in this strange blacklist.
Hacking the Blacklist
Worst, this was an internal mechanism that I was unable to change. It was hidden inside the library I was using. Since I don't own the source code of JSoup I just can't change the implementation and asking for them to do that would be stretching the rope. On the other side this is just a pet project that I own and that I'm the only user so why not change this blacklist at runtime? Yeah that is perfectly possible and since I manage the codebase alone that is not a problem. So lets do some java reflection code.
public static void setStaticValue(Field field, Object value) throws Exception {
//Make the field accessible for change
field.setAccessible(true);
//Get all the modifiers available
Field modifiers = Field.class.getDeclaredField("modifiers");
modifiers.setAccessible(true);
//Change the modifiers to disable final attribute for the field
modifiers.setInt(field,field.getModifiers() & ~Modifier.FINAL);
//set the value on the field
field.set(null,value);
}
This is the reflection core code that enables me to change at runtime static final objects. Well apparently these are not so final anymore.
The job was pretty much done with the following code
private void hackIntoRestrictedHeaders() throws Exception{
Field[] fields = HttpURLConnection.class.getDeclaredFields();
List<Field> f=Arrays.stream(fields)
.filter(fi -> fi.getName() == "restrictedHeaders" || fi.getName() == "restrictedHeaderSet")
.collect(toList());
ReflectionUtils.setStaticValue(f.get(0),new HashSet<>());
ReflectionUtils.setStaticValue(f.get(1),new String[]{});
}
that pretty much search for the wanted fields and replaced the values for an empty Map and and empty Array of Strings.
So the end code woud end up looking like this
public List<Destination> getDestinations(String origin) throws Exception {
hackIntoRestrictedHeaders();
Connection con = DomUtils.get(RODONORTE_URL+"pt");
Connection.Response r = con.execute();
con = DomUtils.get(RODONORTE_URL+"plugins/ximyticket/xiMyticket.ajax.php");
con.ignoreContentType(true);
con.header("Origin","http://www.rodonorte.pt");
con.header("X-Requested-With","XMLHttpRequest");
con.header("Referer","http://www.rodonorte.pt/pt/");
con.data("method","GetStopsRest","isRest","true");
Document doc = con.post();
Map jsonData = new Gson().fromJson(doc.body().text(),Map.class);
return ((ArrayList<LinkedTreeMap<String,String>>) jsonData.get("data"))
.stream()
.map(this::parseDestination)
.collect(toList());
}
and, as expected, the Origin header was found in the wireshark captured packets.
POST /plugins/ximyticket/xiMyticket.ajax.php HTTP/1.1
Accept-Encoding: gzip
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/43.0.2357.81 Chrome/43.0.2357.81 Safari/537.36
Origin: http://www.rodonorte.pt
X-Requested-With: XMLHttpRequest
Referer: http://www.rodonorte.pt/pt/
Host: www.rodonorte.pt
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-type: application/x-www-form-urlencoded
Content-Length: 31
method=GetStopsRest&isRest=true
But in the end
After some more digging on the http call I found that the Origin header was not important after all and that all my work was basically useless. I end up not using this reflection stuff at all. Nevertheless this is a very good exercise on the theme of hacking and this also shows some impressive powerful, and dark, tools we can use in Java.
Final note
As disclaimer let me say that the use of reflection for this kind of hacks is a terrible idea and violates all the good practices and principles of architecture you can name of. This was done as an Hack and we must look at it as that. A simple upgrade in the library and your code suddenly breaks. By doing this you also risk yourself to break some runtime execution path that is not being executed in most of the cases. Well in a nutshell just don't do this as a professional. Your life will thank you.