Googleアナリティクス

2019年2月20日水曜日

Check a website has been updated

Sometimes I want to detect if a specific website has been updated. If RSS is provided, it can be judged based on it, but it can not be done easily if there is only information on the website. As one method, there is a method to save it somewhere in another storage and compare it, but this time I tried to compare the method without using the storage.

LogicFlow created has been uploaded to Github.

Logic Apps and Flow have properties different from ordinary applications and APIs. It is a"running for a long time". Normally, it is considered correct to terminate processing in as short a time as possible, but this is not the case with Logic Apps and Flow. Logic Apps can be up to 90 days in length, and even in Flow it can be running for 30 days. This is a good example of approval function (Approval connector) often used in Flow, it is usual to take a few days to finish after requesting approval. If you have this way of thinking, be able to do things simply like this time.

LogicFlow created is as follows.

Use a request (HTTP Request) trigger. First of all, we will acquire the target site with the HTTP connector. The site I tried to detect this time is the Release Note site of Microsoft Flow (https://docs.microsoft.com/ja-jp/business-applications-release-notes/powerplatform/released-versions/flow), We compare the contents acquired on the previous day and the contents acquired on the day and make a judgment that it was updated if there is a difference.

However, items containing dynamically internally generated URLs are hidden on this site. Depending on the timing of access, there were different URLs each time. In order to deal with this problem, judgment is performed limited to the value of meta tag which represents the update date. If such a thing is not embedded, there is no problem even if you compare the values ​​acquired with the HTTP connector as they are.

Extract the parts limited to the meta tag as follows.

First, we encode contents of HTML obtained by HTTP connector using uriComponent function. Logic Apps and Flow have characters that can not be handled easily, such as line feed codes. For example, \ r \ n represents a line feed, but if you use the uriComponent function it will be the string% 0D% 0A. Divide the conversion result into an array line by line. In this case, we specify the split function as the line feed code mentioned in the example above.

split(uriComponent(body('Get_MicrosoftFlow_Weekly_Release_Note')),'%0D%0A')

From the converted array, extract the part that is the meta tag.

equals(startsWith(item(), uriComponent('<meta name=\"updated_at\"')), true)

The StartsWith function is a function that judges "start with specified characters". Using this, we restrict the array to only those that describe meta tags. Since it was able to analyze that the meta data representing the update date is set with the name of updated_at, we restrict only the element and set the result of filtering to the array variable as it is.

At the initial startup, there is no previous result to be compared, so make a decision with the IF connector. Since the previous data is passed to the HTTP Request trigger, if there is not any data from trigger, it is judged that there is no previous data.

If data existed the previous time, in order to compare with the data acquired this time, set it to an array variable so that it has the same shape. And we compare each array variable, if there is a difference, there was an update, otherwise it is judged as the same.

Finally wait for 1 day processing, next time you attach the data and call the same LogicFlow.By doing this, it is possible to inherit the data acquired this time to the next processing. Since it is possible to pass necessary information without going through external storage, it seems to be good if you arrange it and use it.

0 件のコメント:

コメントを投稿