so i i'm martin robinson and i mean a gully and they work and what can and i so they wanna talk was it about the work we've been doing with like a G D K and especially i'm gonna focus on some practical things for people who in bed but okay some changes you'll have to make if you for your application directly to i just wanna say like to preface this talk by saying that for us to make a G T K this this table really celebrity was really a revolutionary step in the development of the library rather than on an evolutionary step really changed one of the characteristics of the library so we're actually really excited about it so i suppose there with a quick review for those of you who aren't intimately familiar with like it talk little bit about what is what it is for so what it is is what's referred to as a web content engine which basically means if you have a web browser everything inside inside the chrome in that little box is rented web content and that's what the libraries responsible for as well as some ways in which that content touches the outside world so right it processes in renders web content and processing includes both parsing the H T M L and the C S in rendering it as well as running the java script so it was started as a for kick H T M L and for a little while it was closed source but eventually with open source and two thousand five and on the page one of the goals of the project is actually that it's open source that it's this is usable and visible to everyone as well as these to sort of companion goals compatibility in compliance compatibility meaning that there's a lot of content on the web and that the engine should be able to render that content it shouldn't break websites that exist the actually the their criteria for breaking websites it has to be something very important and websites have to be a very small percentage of other sites on the internet for instance on the blink mailing list recently they were talking about removing the feature and the feature was use on something like point a percent of websites and some was like that's a lot and it is a lot when you have millions and millions of pages that's a lot of pages so the other part of this is compliance which means that the engine should be should be compliant with the specs and is a kind of a competing goals away because sometimes to be compatible with pages you need to not be compliant with the spec so it's always this kind of back and forth conversation we have obviously stability performance are important because the web browser should be fast and it shouldn't crash also security which all talk a little bit about more about the security issue is very important portability it should be written in a way that's that makes it useful a lot of systems not just a mac not just intel computer usability in package that would be and hack ability is really a statement about the quality of the code the code to be written in a way that's easily readable easily changeable it should be abstracted away and in the right amount not too much not to will just enough to make it easily hack able you never wanted to be a pain to have to go change the code to fix about any time there's a barrier in the way that means less bugs will be fixed and then they also stay on the website some non goals which is in some sense equally important because sometimes you shouldn't be turning this wiring tool for web browser it's not meant to be able web browser it's meant to be a component it's reusable inside webbrowsers so they need to be a dividing line between what features go in the library what features belong in the embedding application recline it's also not a science project it should be which means that it should be relevant to what exists in the world today it's made to render web content that exists it shouldn't necessarily be place to experiment with things the people will never user are important right now those things can be worked out in what you can meet them halfway the third thing here is it's not meant to be split into a bunch of reusable components which is kind of and sometimes in contrast work with going on because a lot of times in get home when we see that there's a piece of going on that's useful for a lot of other tools suisse you know split into a library and web get the fourth is a little different you know every time you split a something out to library there's some overhead and maintaining that you have more consumers so it's a little it's a little bit more i guess like of a hermit community you know where together working on this thing and you don't always wanna likes but also means we can right so another the interesting about what is it split into things called ports and you can kind of see what is going there's a T K pork important you know for a mac and windows for tutors on safari import so are essentially the common web get code which is most of the code is common in some layer at the bottom which abstracts away the platform for instance networking or how to draw to a campus how to talk to system and then that's at the bottom and then at the top is the api there the egg i layer is what the embedding application uses and way web "'cause" is design is the every and there is a little different so for instance for the wreckage indicate for in the problem later we use once you for networking use cover restoration opengl for making the scene raffles will talk more about later web gel injuries you refer media and what gets made in such a way that these components in most of the web get code are totally abstracted away into a wrapper classes that had the same semantics whether you're writing on a mac or on for G T K and anytime the semantics differs it's kind of like a little bug that needs to be fixed usually there's always a little tricky bits of getting the semantics of different platforms of to match up because a C G canvas core graphics isn't necessarily the same as a cover canvas for instance in cairo used or the path on the canvas but it's a little different in some other platforms so and then at the top of like a G D K there is the A P I later which is essentially a single a G T K widget the website web you that would you that is the browser went the window into the web content and some G I D K P Is around that and some of the consumers of repeated a game betters are epiphany but or you know that so maybe you're familiar with these is applications okay so here's an example of what i was talking about so this is a so simple by architecture diagram of what can and at the bottom there's this thing called the media which is essentially a little bit like booze it's like a i it wraps it makes it was a little nicer to use include some collections some platform abstractions abstracts away like threads and javascript for which is the javascript engine and these days another blankets for jobs to for is the only just in general it and sitting on top of that is so what for which includes a platform layer and the rest of web for and i'm separating those because again the platform layer are our classes that rap cairo for instance where is the rest of web for are is functionality that's common to all platforms like the functionality that takes a stream of data and parses out C S rules sitting on top of that is web kit which is how do i describe that a web get is sort of like the glue between web for and the browser so this includes the api layer but also includes some code for like handling different situations and sort of translating that into a pi concepts that's a little fuzzy but on top of that's it's the application and noticed it right now in this diagram again this is what get one these are all on the same process this is just a normal library so before i start talking about web get to i just wanna talk a little bit of a little bit about the motivation for what get to so some minor philosophical point which i think is what the thinking that drove the creation of chromium and draw the creation but get to and i means that this is the future of the way so code has about this they crash the program or just bucks all got has boats and colours bugs that allow arbitrary code execution which especially if that code includes a java script engine that's writing machine code into memory and not only just what happens cut has dependencies that have bugs so maybe you've written perfect code but you're using library like phone configure higher that has a bug one of these buttons and four point is even if everything was looking good live the your code the dependencies you're gonna be processing things from though from the world that you don't trust their like little programs france and images S V G images and these are all like small set of instructions that mean that the scope of the data your processing is why and in the the chance of writing a a font they can we can crash your browser actually i mean it's it's very hard to eliminate these problems so well it was a pragmatic response this i mean maybe you can say that that we're gonna work are gonna fix all the buttons in our browser so that it doesn't crash we're gonna eliminate these security issues but you also have them at the security issues in your dependencies you also have to work with sanitise in your input data which is very hard and instead we say yes that's keep working on fixing the crashes my browser but let's also say that if something goes wrong let's make sure that it doesn't we've our users vulnerable to attack so for instance when we talk about arbitrary code execution one thing to keep in mind is that is it these days web applications are our applications they're like they're like just up applications now and not only other like that stuff publications like you might be running you know angry birds in your browser and like i want side it is your banking information and maybe anger birds you know can reach over and touch your bank account and this isn't like a hypothetical situation this is this is things that actually happen so the web is huge remember so this is what we can do we can we can acknowledge at the web platform is huge in everyday it's getting bigger it's adding more functionality each and you add functionality add more chances for vulnerabilities for crashes and we can we can think of a way to make the crashes less inconvenient for users maybe instead of when the web rendering crashes it doesn't crash the browser we just crashes that's have or just crashes the web rendering part and we can prevent crashes from exposing crashes and screen doors from exposing data from outside the scope of the current page and the way we can get as we can put that data maybe in another address space words harder to get to put some more separation between the data of the different applications and we can also prevent bugs and crashes from damaging the system or executing arbitrary cut that's another name for sandbox so even if even if some paid crashes the browser you can try to that hard this because that process can try to the heart and finally even if we're not talking about a much just page are just talking about it a page that has a really heavy while it shouldn't prevent you from using other pages or clicking a menu it shouldn't prevent you from closing the browser to get away so this is a this is thinking that drives this because to be honest well get to and from in these are like very complicated architectures and and they deserve a good reason so this is the end result we can we can put each web rendering part into it's own process and have some pair process and we could to we call the web rendering process the web process we compare process they why process because the actual from of the browser is in this you are process and we can sandbox the web rendering because you know once you separate out the web are it's it doesn't need to write to the hard disk or even read from the hard disk and i'll talk a little bit more about how to make sam boxing easier later so this is sort of the first web could to architecture diagram a on the left you can see the older architecture diagram a little bit different but you see the api boundary was between the application with kit and here we have now two processes and the A P I is in the U I process but underneath that api it's talking the I P C the inter process communication to another process which has the rest of the library so even if this web trust what web process crashes it's not gonna be able to crash the browser or indeed read arbitrary information from the address space of the U I process and the foregoing are there any questions about this particular "'cause" okay reasonable is it a pretty old concept of this point since programs around for a few years so to teach you details about what's inside which i think i put this here to make it easier to understand the practical bits but essentially we have to process is now they need some way to communicate and i said is what those ways into three distinct one of the first is messaging so say D web process reads the browser title and then it needs to tell you i process that i've read the title you know change the title bar to reflect that sends a message with some arguments the arguments in the message or serialise into a chunk of data it sent across socket to the other side and then de serialise interpreted and there's also a shared memory which is used for sending big chunks of data like the what processes finish rendering the page to an image and sends that it's too big for this socket it sounds that as a target sure and memory you are process we avoid making unnecessary companies and the third is a shared services which are different the czech memory because is typically are on the gpu the what processes put something on a gpu you know what's the send it to the U I process without downloading the data from the gpu again putting in shared memory in the real putting it so for instance in in the X eleven version of repeated okay we use X composite and next damage sort of like we make a little window manager and we send these gpu services to the you i process to run and why do we have to do that that's because web pages these days more are just asking graphs like colour sing graphs for three main reasons the first is that we wanna prevent wanna prevent unnecessary redraw say like some D of is moving animating on top of the rest of web content only this dave is changing and maybe just only in the position so instead of constant reread redrawing entire page what if we just stored all the different layers of the page in the textures and just we can positive those textures on the gpu again and you use actually really good a composite it turns out so it it's quite fast you do of really and second thing is three C S transforms the way those work usually is that they're done on the gpu with a opengl and in so once you once you start doing work on the gpu it's really expensive just stop in bring it back into main memory only to re uploaded again so you can display it that's actually enough to kill your frame right so so it sort of a non starter to do that and the same with what you know web G obviously is opengl which is on a gpu downloading and again downing andrea pointing again will bring the frame rate below the the limits of the human eye so right so the way it works is that the scene graph is built in process in the web process and web process and what's the scene graph is there and all the rendering is there you the composing there you need some way to send those results to do i process and that's where X composite next damage comes and sort of like the way a application does all the rendering insensitive the window manager in the way this will work and lemon is probably that will use a and embedded women composite so working that alright so that sort of the high level overview of web get to and in you know we end up inventing work in a few places so some if you may be asking should i pour my application to web get to if you use what could U K or even any other port of work that and the answer is yes you should fortification with get to in fact even if you don't think it'll be useful the reason is okay G K is moving in the maintenance moon so it turns out that it takes a lot of work to maintain a web chip or so when your team has to maintain to it's a bit harder in addition what did you think it work it won't be deprecated at some point because once you start maintaining work it then you start wearing about security vulnerabilities and fixing bugs so the good thing about this is that web get to is a better api it's richer it exposes more functionality it's more in line with other web to web reports it just all around a better right guy because it's the second time around we made an A P I so we got a lot better at it and top of all that if you put your navigation web get to without doing anything other importing it will be faster more responsive when some random might kind then crashes but it won't crash or application you can just we started it's very nice alright but it's not necessarily easy for all use cases some of the problems are that there's not yet up or to porting guide which is the better shame because we've and promising it for a while and we don't we have it yet but but there is really good A P I documentation and the differences between the two basically boiled down to the second point which is that before before it made sense to do things synchronously so when you wanted to save the page images away into the save is done but in my pocket to that makes a little less sense because now you're you're sending a message to the web process which again you don't necessarily trust anymore you know we're starting to just trust things across a process boundary and instead of waiting for maybe it's better to just just send the request you know save the page and when you're done with that let me know and what this means is a lot of it guys very synchronous now and they look a little bit harder use you have to pass a callback and use sort of G I O style J O style is intrinsically i so the really tricky bit is that if you were doing some sign a some kind of deep integration with the web content you were interacting with the page changing in real time then it becomes actually quite a bit trickier because before you could actually reach down into the library and modify the actual down in memory but now it's not in memory more it's and some other process so some of the process you notice that we trust so what you have to do is used one of these for techniques jetted script source custom protocols you have to die down bindings are page axes we the jesse api so injected script source is a is essentially a and it and the web you would you give it a string of javascript source and you send that to the web process to be executed in the page content in the page context and the resulting javascript return value will be serialised and sent back to you so you can imagine writing a small javascript program to walk the elements of the page and do some processing maybe find say the password field the kind of the pasture field in getting back a string from we process and that looks a bit like this you call what but you run javascript with the web you and then the string here is actually the the script you're right and then you get a callback pretty simple and then the callback you call but it would you run javascript finish like T I O again and you get this serialise return value and everything below that is getting the actual javascript core values from the return value this is funky a J S A P I is are the javascript for api this is like the A P I for touching the javascript engine itself but you can see that we're just converting this value into a string and then converting that string into a C string it's a little bit of a of the paying a bit verbose but but really like other than this callback it's similar to what you would do before so before talk about a custom protocols so maybe views are chromium before maybe and you type about and you get a web page and it's almost like instead of H diffusing this about protocol and that's exactly what custom particles are it's that you're gonna grading with the networking library to add a new protocol to the to the web engine and not only can you can access pages by unloading them you can actually use ajax to interact with the with the U I process for instance you can for instance we have a innovation we have a page about plug ins and it's not there yet but eventually they'll be a button that says disable and what that could do is you could send an ajax request _2d protocol and when it gets that request it process it as if it was a web server again to disable the plug in without reloading the page the big issue with this is that it's a web browser and it subject to same origin security restrictions which essentially means that if you doing ajax promoting resources there are restrictions for accessing resources in another another scheme postport triplet which means that if you try to access the cost this your custom protocol from a web page on a she's ep then it's not gonna work it's gonna be a security but quite a security restrictions don't disables so this what this looks like now again we're just sort of registering this about protocol and again with just a callback what happens here is that is that we get the request and we can read the different properties of the question the path in here i'm just use in the past the printout a response i'm sending the response back to the browser as if i was a web server so before talk about the other ones i wanna talk about web extensions so what makes engines are essentially the way that we've exposed some of the more common techniques of interacting with the page in this multiprocessor environment essentially it's the shared object that the web process finds it loads it it's own address space so you don't have to do in the I P C really if you just working inside the confines of the web extension it's a bit like a plug and the loads in the web process and so you can do things synchronously like walk through the dom and it won't block the U I process at all we're not you are processed maybe doesn't even know and you have to worry about i the overhead of I P C or or not in is great because you have actual direct access to the dom objects just like you did before answer and on top of this the sort of common idea of it injected bundle you something that web get to exposes and all ports sometimes it inside a web extension you want to communicate with the U I process in which case you can just use D bus or whatever you went back typically we use device and this is that what that looks like so occur is a source file with this web kit web extension initialize which is sort of like that you for the name of the entry point to the to a shared object and what happens is once we compile this new we shared object and set the extensions directory you'll find the shared out we can load it and all this call this this function and you can print but also you can used G object on bindings which i guess i should probably explain is a little bit too if you're not familiar with those so essentially there's the doll and if you're familiar with web development you use the dom and javascript to access the internal structure of the page so you can say like page give me your your dave's and you can look at all the did you can see their contents you can see other properties or C S properties whatever and that's that's the javascript down bindings what that means is that it exposes these you there's inside or see possible subjects it exposes them to javascript and likewise you've written G a breakdown bindings which means that you can walk the dom with do you object and that means you can walk the don't see or any other language it supports geography introduction which is quite nice and unfortunately not of the dom is in another process we can just do that from the from the you i process anymore we have to do it in the web extension and again we see the web kit web extension initialize function which in which we connect to the page created signal of this extension object so page created is like you open the browser to and now we have a new browser time here in the callback for page created we attach to the document what it signal which so what obviously fires when the document is finishes loading and that point maybe we need a title using the exact same down binding it pi so we had a market one so if you more steps and we kind of get to feature parity with work at one so so at this point we're waiting the value of all those things i mentioned before security stability not exposing users banking information to fishers and scammers versus like a couple function calls and compound sure object so finally the most flexible approach which will be unveiled global be and upcoming work htk release is that we can we can use directly the javascript core api to interact with the page and what this means is that not only can we walk the dom but we can make a new javascript objects that are backed by native code say like you make a new object in the page can actually interact with that object for instance maybe you want to expose some system functionality to the page if you're making a hybrid application for instance and you want it to be able to like put the screen to sleep or maybe prevent the screen from sleeping if you want your video player application to not a some like at a simple it's what's playing what video what you can do is you can use this A P I to expose new objects into the world of the page and have the page javascript interact with it interact with the application and as well is that you can just execute arbitrary javascript and the web process for this you need to know the jobs to cory pi which isn't actually so complicated but at some point we really like to be able to just exposed you objects directly with see that that's a ways off but this is the most flexible approach and it's really like it if you really need the interaction with the page you'll have to do this our so that was a practical section i hope that it was useful for some betters to sort of see what's involve important work it to and how about convince use that it's worth it and keep in mind that like this is not just what can stick at the whole web this is beginning to look like this multiple processes and it it's a it's beginning to look like this because the web is beginning to look like an operating system the web platforms getting to look like the application platform and we already user browsers like this i mean many of you probably keep a web browser open all the time with one application running i mean that's not different in keeping an application running in your window manager i mean the distinction between web applications and applications is is almost gone i keep saying it but it's like a thirty happened so so what's gonna happen with get to in the future given us the architecture diagram gets a little bit more complicated we have more processes because we did it once in a work so when i keep doing it and so we run out of process handles so so what we have here is the not only do we have web processes we have no word process worker process stored process it seems first it seems like a little bit superfluous to be also is also something like why so many different processes but really it makes good sense in fact because when you think about it we really wanted to send box the web process we didn't want it to be able to read the disk or even access the network you know maybe maybe it's dangerous to allow arbitrary code execution to talk to that work and one interesting thing is that the way make it to works now is when the web process crashes all your times crash and really it would be nice if it was like from in where when attack crashed with just that time so that means we need multiple web processes running which means that they're all trying to talk to network which should be fine they could do that separately but once they talk to the now to take all their data and they try to put into the cash they try to the cookie store and maybe that cookie store shared among different processes which means that we start having like contention issues and we have to worry about multiple writers multiple readers so instead of handling all that we just split are all the networking all the cookie storage into it on process and we have all these different processes talk to this one or process likewise there are a pi is in the web platform what if you actually that write to the disk and if we sandbox the web process to laurie range of the desk and those if you guys won't work so instead of having that capabilities write to the disk there with this possibly militias java script code we split out the disk access use worker process or starts is stored process and the way that we want to think about like these process communications again is that we just trust the process on the other side we will have to cover is if as if that process has already been compromise is it sending us the most people message as possible but that's a lot easier then if there was no single point of communication between the processes there wasn't just if we had to make a decision all the time like overseas just we're doing I P C handle a similar was talking about snow we isolate applications from each other as well as really why our and the the web process regression all the taps just crash you know that one page makes a marketing lot easier the nice thing about this storage process is that this access is really slow so there's always some walking going on if we if we always do that is increasing in another process there's no issue with that it could be a threat but then we couldn't it sandbox and that's a feature vector to and that was my talk so is there any questions i can answer them now probably