HTTP Endpoint
API Client Library DocumentationThis documentation is for the HTTP Endpoint. For GET
and POST
requests.
For API Libraries and Examples in these or other languages, refer to the Docs Index Page
These examples use GET
requests because they are the simplest way of showing examples and getting started, but it is strongly suggested you use the POST
endpoint for "real" usage.
Basic Examples:
png
if you need transparency (if the backgrounds are weirdly black)@media
directives. Format your page accordingly!pageResponse.pageRequest
for the default values to other parameters.Advanced Examples:
viewport
to 640px width, zooms out, and sets a 640x500 clipRectangle
to generate a thumbnail image of the CNN.com website.www.highcharts.com
has been replaced with wikipedia.org
but the path is preserved. read the text of the output error message to see this. clearCache:true
parameter. This is required so that .css resources are re-requested and blocked.pageResponses.cookies
property. clearCache:true
parameter. This is required so that all resources are re-requested so headers are populated.Hello:world
and Accept-encoding:tacos
values in the resulting HTML. Uses our example show request details pageoutputAsJson:true
or you will not get usable resultsapplication/json
Content-Type and submit POST data. Check the payload
section of the output results.resourceTimeout
parameter like this example does. We will be developing our own proxy service in the comming months.Other Examples: See the Docs Index Page Usage FAQ for other examples (such as Page Automation / Button Clicking)
To use this API, you need to submit a GET
or POST
request to the API endpoint:
GET
requests are the simplest way of showing examples and getting started, but it is strongly suggested you use the POST
endpoint for "real" usage.
[YOUR-KEY]
/?request=[REQUEST-JSON]
[REQUEST-JSON]
should be encoded via encodeURIComponent()
, do not encode the parts individually.[YOUR-KEY]
/?requestBase64=[REQUEST-JSON]
[REQUEST-JSON]
should be Base64 Encoded., do not encode the parts individually, and do not urlEncode before you Base64 Encode.[YOUR-KEY]
/[REQUEST-JSON]
application/json
Your key used for creditBalance/billing. For the Stage2 Preview, this can be anything you want. Later you can obtain this by signing up.
The details of your request as a JSON object. This JSON can take one of the following three forms, each of which are described on the io-datatypes doc page:
{url:"http://www.example.com",renderType:"jpg",outputAsJson:true}
[{url:"http://www.google.com"},{url:"http://www.google.com/doodles/",renderType:"jpg",outputAsJson:true}]
{proxy:false,pages:[{url:"http://www.google.com"},{url:"http://www.google.com/doodles/",renderType:"jpg",outputAsJson:true}]}
While the url
is required, everything else is optional. Click here to view the default values if you leave the various parameters blank
You can use CORS
(prefered) or JSONP
to integrate api calls from PhantomJs Cloud directly into your webpage.
CORS
enabled by default.JSONP
format, add the ?callback=CALLBACK_FUNCTION_NAME
querystring to your request (for either GET
or POST
requests). You MUST set outputAsJson:true
for JSONP to work. See the Advanced Examples section for a demo.Please refer to the main Docs Index Page for Basic Troubleshooting, Testing and Performance Optimization, Usage FAQ, and language-specific samples.
Various methods in the phantom object, as well as in WebPage instances, utilize phantom.cookies objects. These are best created via object literals.
unix epoch timestamp (in ms) Javascript Example: (new Date()).getTime() + (1000 * 60 * 60) // <-- expires in 1 hour
information about the frames of the page
number of children contained by this frame
the children of this page (a hiearchy of frames)
the html content of the frame
the name of the frame. use this when requesting the frame to be rendered
the url of the frame
The parameters for requesting and rendering a page. When you submit an array of IPageRequests, they are loaded in-orrder, and only the last one is rendered. All variables except 'url' are optional.
if specified, will be used as the content of the page you are loading (no network request will be made for the url
). However, the url
property is still required, as that will be used as the page's "pretend" url
example: content:"<h1>Hello, World!</h1>",url:"about:blank"
TRUE to return the page conents and metadata as a JSON object. see IUserResponse if FALSE, we return the rendered content in it's native form.
settings related to rendering of the last page of your request. See the IRenderSettings documentation (below) for details
"html": returns the html text,
"jpeg"|"jpg" : The default. renders page as jpeg. transparency not supported. (use png
for transparency),
"png": renders page as png,
"pdf": renders page as a pdf,
"script": returns the contents of window['_pjscMeta'].scriptOutput
. see the scripts parameter for more details,
"plainText": return the text without html tags (page plain text),
settings related to requesting internet resources (your page and resources referenced by your page)
Execute your own custom JavaScript inside the page being loaded.
see IScripts
docs for more details.
add the nodes from your pageResponse that you do not wish to transmit. This reduces the size of your data, thus reducing cost and transmission time. if you need the data in these nodes, simply remove it from this array.
required. the target page you wish to load
adjustable parameters for when making network requests to the url specified
Information about the page transaction (request and it's response).
cookies set at the moment the page transaction completed.
events that occured during requesting and loading of the page and it's content
the Frames contained in the page. The first is always the main page itself, even if no other frames are present.
headers for the primary resource (the url requested). for headers of other resources, inspect the pageResponse.events (key='resourceReceived')
information about the processing of your request
the request you sent, including defaults for any parameters you did not include
the status code for the page, a shortcut to metrics.targetUrlReceived.value.status
options for specifying headers or footers in a pdf render.
if specified, this is used for the first page (instead of the repeating)
required. Supported dimension units are: 'mm', 'cm', 'in', 'px'. No unit means 'px'.
if specified, this is used for the last page (instead of the repeating)
if specified, this is used for single page pdfs (instead of the repeating)
specify a header used for each page. use wildcards for pageNum,numPages as shown in this example:
repeating:<h1><span style='float:right'>%pageNum%/%numPages%</span></h1>
options specific to rendering pdfs. IMPORTANT NOTE: we strongly recommend using px
as your units of measurement.
Border is optional and defaults to 0. A non-uniform border can be specified in the form {left: '2cm', top: '2cm', right: '2cm', bottom: '3cm'} Use of px
is strongly recommended.
set the DPI for pdf generation. defaults to 150, which causes each page to be 2x as large (use "fit to paper" when printing) If you want exact, proper page dimensions, set this to 72.
settings for footers of the pdf
Supported formats are: 'A3', 'A4', 'A5', 'Legal', 'Letter', 'Tabloid'. .
settings for headers of the pdf
height and width are optional if format is specified. Use of px
is strongly recommended. Supported dimension units are: 'mm', 'cm', 'in', 'px'. No unit means 'px'.
optional. ('portrait', 'landscape') and defaults to 'portrait'
height and width are optional if format is specified. Use of px
is strongly recommended. Supported dimension units are: 'mm', 'cm', 'in', 'px'. No unit means 'px'.
optional png quality options passed to PngQuant. you must set pngOptions.optimize=true to enable these, otherwise the original non-modified png is returned.
2 to 256. default 256.
default false. true to disable dithering
default false, which is to return the original png. if you pass true, we will optimize the png using PngQuant. smaller file size but takes longer to process
1 to 100. default 80. Instructs pngquant to use the least amount of colors required to meet or exceed the max quality.
default 0. If conversion results in quality below the min quality the image won't be compressed
default 8. (very fast). value can rage between 1 (slow) and 11 (fast and rough)
authentication information for the proxy. ex: username:password
the address and port of the proxy server to use. ex: 192.168.1.42:8080
If your proxy requires a IP to whitelist, use api-static.phantomjscloud.com
for your requests.
type of the proxy server. default is http
available types are http
, socks5
, and none
allows specifying a proxy for your userRequest
(all the pageRequests it contains) To use the built-in proxy servers, you must set the geolocation
parameter.
Alternatively, you may use your own custom proxy server by setting the custom
parameter.
allows you to use a custom proxy server. if you set this, the built-in proxy will not be used. default=NULL
specify the geographic region of the builtin proxy server you use.
defaults to any
. possible values are any
, us
(usa), de
(germany), gb
(great britan), ca
(canada), sg
(singapore)
IMPORTANT: Not yet implemented. So for now, all values are treated as any
specify what builtin proxy server you use.
by default, the auto-proxy system will randomly pick from an available proxy server.
If you want to specify a specific (fixed) proxy server, set this instanceId
to a number, then all requests will direct to the same builtin server..
If you want to use the proxy server in a round-robin style (recommended!) each request should increment this instanceId
by one.
when a page is rendered, use these settings.
This property defines the rectangular area of the web page to be rasterized when using the requestType of png or jpeg. If no clipping rectangle is set, the entire web page is captured. Beware: if you capture too large an image it can cause your request to fail (out of memory). you can choose any dimensions you wish as long as you do not exceed 32M pixels
default false. If true, we will pass through all headers received from the target URL, with the exception of "Content-Type" (unless the renderType=html
)
pdf specific settings. Example:
border: "0",
footer: {
firstPage: "", height: "1cm", lastPage: "", onePage: "", repeating: "<h1><span style='float:right'>%pageNum%/%numPages%</span></h1>"
},
format: "letter",
header: {
firstPage: "", height: "0cm", lastPage: "", onePage: "", repeating: ""
},
height: "11in",
orientation: "portrait",
width: "8.5in", }
optional png quality options passed to PngQuant. you must set pngOptions.optimize=true to enable these, otherwise the original non-modified png is returned.
jpeg quality. 0 to 100. default 70. ignored for png, use pngOptions to set png quality.
specify an IFrame to render instead of the full page. must be the frame's name.
size of the browser in pixels
height is not used when taking screenshots (png/pdf). The image will be as tall as required to fit the content. To set your screenshot's dimensions, use the pageRequest.clipRectangle property.
This property specifies the scaling factor for the screenshot (requestType png/pdf) choices. The default is 1, i.e. 100% zoom.
settings related to requesting internet resources (your page and resources referenced by your page)
username/password for simple HTTP authentication
if true, will clear the browser memory cache before processing the request. Good for expiring data, and very important if blacklisting resources (see resourceModifier parameter). Default is false.
if true, will clear cookies before processing the request. Default is false.
IMPORTANT NOTE: to protect your privacy, we always clear cookies after completing your transaction. This option is only useful if making multiple requests in one transaction (IE: multiple pageRequests
in a userRequest
API call)
Set Cookies for any domain, prior to loading this pageRequest. If a cookie already exists with the same domain+path+name combination, it will be replaced. See ICookie for documentation on the cookie parameters.
specify additional request headers here. They will be sent to the server for every request issued (the page and resources). Unicode is not supported (ASCII only)
example: customHeaders:{"myHeader":"myValue","yourHeader":"someValue"}
if you want to set headers for just the target page (and not every sub-request) use the pageRequest.urlSettings.headers
parameter.
delete any cookie with a matching "name" property before processing the request.
set to true to disable all Javascript from being processed on your page.
set to true to skip loading of inlined images. If you are not outputing a screenshot, you can usually set this to true, which will decrease load times.
the maximum amount of time (timeout) you wish to wait for the page to finish loading. When rendering a page, we will give you whatever is ready at this time (page may be incompletely loaded). Can be increased up to 5 minutes, but that only should be used as a last resort, as it is a relatively expensive page render.
array of regex + adjustment parametes for modifying or rejecting resources being loaded by the webpage.
Example: "resourceModifier": [{regex:".*css.*",isBlacklisted:true}{"regex": "http://mydomain.com.*","setHeader": {"hello": "world","Accept-encoding": "tacos"}}]
IMPORTANT NOTE: If you use this to blacklist resources, it is strongly recommended you also set the clearCache
parameter. This is because cached resources are not requested, and thus will not be able to be blacklisted.
maximum amount of time to wait for each external resource to load. we kill the request if it exceeds this amount.
maximum amount of time to wait for each external resources to load. (.js, .png, etc) if the time exceeds this, we don't cancel the resource request, but we don't delay rendering the page if everything else is done.
if true, will stop page load upon the first error detected, and move to next phase (render or next page)
default useragent is "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/534.34 (KHTML, like Gecko) Safari/534.34 PhantomJS/2.0.0 (PhantomJsCloud.com/2.0.1)"
Milliseconds to delay rendering after the last resource is finished loading (default is 1000ms). This is useful in case there are any AJAX requests or animations that need to finish up. If additional network requests are made while we are waiting, the waitInterval will restart once finished again. This can safely be set to 0 if you know there are no AJAX or animations you need to wait for (decreasing your billed costs)
set to true to enable web security. default is false
set to true to prohibit cross-site scripting attempts (XSS)
regex + adjustment parametes for modifying or rejecting resources being loaded by the webpage. Example: {regex:".*css.*",isBlacklisted:true}
special pattern matching regex. capture groups can replace parts of the changeUrl
that use the special marker tokens $$0
, $$1
, etc on to $$9
.
for example: if resourceUrl="http://google.com/somescript.js"
changeCaptureRegex="^.*?/(.*)$"
would create a match group for everything after the last /
character and changeUrl="http://example.com/$$1"
would then get evaluated to "http://example.com/somescript.js"
changes the current URL of the network request. This is an excellent and only way to provide alternative implementation of a remote resource.
you can even use a dataURI so that you can set the contents directly, Example: data:,Hello%2C+World!
additionally, you can use special marker tokens to replace parts of the changeUrl with the original resource url. the special marker tokens are $$port
$$protocol````
$$host```$$path
. For example changeUrl="$$protocol://example.com$$path"
also, you can use the changeCaptureRegex
parameter to provide custom marker tokens.
if true, blacklists the request unless a later matching resourceAdjustor changes it back to false (we process in a FIFO fashion) by default, we don't blacklist anything. You should keep it this way when rendering jpeg (where the visuals matter), if processing text/data, blacklisting .css files ['..css.'] will work fine. check the response.metrics for other resources you could blacklist (example: facebook, google analytics, ad networks)
pattern used to match a resource's url examples: it really depends what the site is and what you are wanting to block, but for example to block anything with the text "facebook" or "linkedin" in the url:
javascript requestModifiers:[{regex:".*facebook.*",isBlacklisted:true},{regex:".*linkedin.*",isBlacklisted:true}]
It's especially useful if you just need the text, as you can block all css files from loading, such as: ".*\.css.*"
Don't use this to block images. instead, images are blocked by using the requestSettings.ignoreImages:true property
optional key/value pairs for adjusting the headers of this resource's request. example: {"Accept-encoding":"gzip", "hello":"world"}
properties exposed to your custom scripts
via window._pjscMeta
set to false by default. set to true to force rendering immediately. good for example, when you want to render as soon as domReady happens
set to false by default. if true, will delay rendering until you set it back to false. good if you are waiting on an AJAX event.
allows you to override specific pageRequest options with values you compute in your script (based on the document at runtime)
set the clipRectangle for image rendering. here is an example you can run in your domReady or loadFinished script: _pjscMeta.optionsOverrides.clipRectangle = document.querySelector("h1").getBoundingClientRect();
Scripts can access (readonly) details about the page being loaded via window._pjscMeta.pageResponse
See IPageResponse for more details.
Your scripts can return data to you in the pageResponse.scriptOutput
object. You can access this directly via windows._pjscMeta.scriptOutput
or your script can simply return a value and it will be set as the scriptOutput
(not available on external, url loaded scripts)
how many custom scripts have been loaded so far
Execute your own custom JavaScript inside the page being loaded.
INPUT
You can pass in either the url to a script to load, or the text of the script itself. Example: scripts:{domReady:["//cdnjs.cloudflare.com/ajax/libs/jquery/2.1.0/jquery.js","return 'Hello, World!';"]}
OUTPUT
Your scripts can return data to you in the pageResponse.scriptOutput
object. You can access this directly via windows._pjscMeta.scriptOutput
or your script can simply return a value and it will be set as the scriptOutput
(not available on external, url loaded scripts)
Also, if you use the pageRequest.renderType="script"
setting, your response will be the scriptOutput
itself (in JSON format) which allows you to construct your own custom API. A very powerfull feature! *
triggers when the dom is ready for the current page. Please note that the page may still be loading.
triggers when we determine the page has been completed. If your page is being rendered, this occurs immediately before then.
IMPORTANT NOTE: Generally you do NOT want to load external scripts (url based) here, as it will hold up rendering. Consider putting your external scripts in domReady
adjustable parameters for when making network requests to the url specified. used by PageRequest.
submitted in POST BODY of your request.
defaults to 'utf8'
custom headers for the taret page. if you want to set headers for every sub-resource requested, use the pageRequest.requestSettings.customHeaders
parameter instead.
GET (default) or POST
The 'main' form of user request, allows specifying pages to load in order. Later will provide other 'global' options such as geolocation choices.
optional, specify an alternate backend instead of the default phantomjs process. the default value if not specified is default
.
options are: default
: the current stable backend. (phantomjs v2.1.1). beta
: the latest backend we are testing (phantomjs v2.5b). You can also specify an exact backend: phantom 2.1.1
or phantom 2.5beta
.
setting this forces the value of the outputAsJson parameter, regardless of what the last page's value of outputAsJson was set to. default is undefined.
array of pages you want to load, in order. Only the last successfully loaded page will be rendered.
Use proxy servers for your request. default=false
.
set to true
to enable our builtin proxy servers, or use the parameters found at IProxyOptions for more control/options, including the ability to specify your own custom proxy server.
IMPORTANT: for now, to use the builtin proxy servers, you must use the api endpoints found at ouo.io/Aa2GNC This is because our proxy provider requires Whitelisting us by Static IP addresses. This requirement will be removed after we exit Beta.
Additionally, When you use proxy servers, be aware that requests will be slower, so consider increasing the pageRequest.resourceTimeout
parameter like the Proxy Example does.
This is returned to you when "outputAsJson=true".
the rendered output of the last pageRequest
data in either base64 or utf8 format
utf8 or base64
headers of the target url, only set if pageRequest.renderSettings.passThroughHeaders===true
filename you could use if saving the content to disk. this will be something like 'content.text', 'content.jpeg', 'content.pdf' thus this informs you of the content type
the size of data, in bytes
the final url of the page after redirects
metadata about the transaction
information about the PhantomJsCloud.com system processing this transaction
identifier of the system, for troubleshooting purposes
PhantomJs
version of phantomjs. (major/minor/point)
number of requests processed by this backend
how much this transaction costs.
NOTE: the creditCost, prepaidCreditsRemaining, and dailySubscriptionCreditsRemaining are also returning in the HTTP Response Headers via the keys
pjsc-credit-cost
, pjsc-daily-subscription-credits-remaining
, and pjsc-prepaid-credits-remaining
the total cost of this response
estimation of your remaining daily creditBalance. This is incrementally refilled hourly.
hint our pjsc-be-phantom writes so api endpoint knows if should send back only the content.
the original request, without defaults applied. to see the request with defaults, see pageResponses.pageRequest
a collection of load/processing information for each page you requested.
the HTTP Status Code PhantomJsCloud returns to you
This property defines the rectangular area of the web page to be rasterized when using the requestType of png or jpeg. If no clipping rectangle is set, the entire web page is captured.