<?xml version="1.0" encoding="utf-8"?>
<launchpad-bug id="1018694">
  <date_last_updated>2012-06-29 19:22:49.665698+00:00</date_last_updated>
  <api_links>
    <bug_api_link>https://api.launchpad.net/1.0/bugs/1018694</bug_api_link>
    <bug_owner_link>https://api.launchpad.net/1.0/~anton-khalikov</bug_owner_link>
    <milestone_link>https://api.launchpad.net/1.0/maria/+milestone/5.2</milestone_link>
    <linked_branches_collection_link>https://api.launchpad.net/1.0/bugs/1018694/linked_branches</linked_branches_collection_link>
    <activity_link>https://api.launchpad.net/1.0/bugs/1018694/activity</activity_link>
  </api_links>
  <bug_web_link>https://bugs.launchpad.net/bugs/1018694</bug_web_link>
  <owner>Anton Khalikov</owner>
  <assignee>Sergei</assignee>
  <milestone_title>Maria 5.2</milestone_title>
  <duplicate_link></duplicate_link>
  <duplicate_bug_id></duplicate_bug_id>
  <title>CPU_TIME counter from USER_STATISTICS gets wrong after some uptime</title>
  <status>New</status>
  <importance>Medium</importance>
  <created>2012-06-28 03:49:20.524066+00:00</created>
  <description>
<![CDATA[Hi there

We collect user statistics and run "SELECT * FROM INFORMATION_SCHEMA.USER_STATISTICS; FLUSH USER_STATISTICS" every 5 minutes and then put received values to rrd databases. We noticed that after some uptime CPU_TIME counter goes mad and starts to show incredibly high usage values. After restarting MariaDB process it goes back to normal for some time. Please see attached graphs as examples. One can notice a huge drop down of CPU_TIME counter on both graphs. These are graphs for a random customer from two different MariaDB servers.

Platform used: Debian Squeeze amd64, MariaDB 5.2.12-MariaDB-mariadb115~squeeze-log from official package.]]>  </description>
  <activities>
    <activity datechanged="2012-06-28T03:49:20.524066+00:00">
      <oldvalue>
<![CDATA[]]>      </oldvalue>
      <newvalue>
<![CDATA[]]>      </newvalue>
      <whatchanged>bug</whatchanged>
      <person>Anton Khalikov</person>
      <message>added bug</message>
    </activity>
    <activity datechanged="2012-06-28T03:49:20.524066+00:00">
      <oldvalue>
<![CDATA[]]>      </oldvalue>
      <newvalue>
<![CDATA[Graph 1 https://bugs.launchpad.net/bugs/1018694/+attachment/3206286/+files/dbstdusercpu.png]]>      </newvalue>
      <whatchanged>attachment added</whatchanged>
      <person>Anton Khalikov</person>
      <message></message>
    </activity>
    <activity datechanged="2012-06-28T03:49:48.580746+00:00">
      <oldvalue>
<![CDATA[]]>      </oldvalue>
      <newvalue>
<![CDATA[Another graph https://bugs.launchpad.net/maria/+bug/1018694/+attachment/3206287/+files/dbstdusercpu2.png]]>      </newvalue>
      <whatchanged>attachment added</whatchanged>
      <person>Anton Khalikov</person>
      <message></message>
    </activity>
    <activity datechanged="2012-06-29T19:21:34.393370+00:00">
      <oldvalue>
<![CDATA[]]>      </oldvalue>
      <newvalue>
<![CDATA[Sergei (sergii)]]>      </newvalue>
      <whatchanged>maria: assignee</whatchanged>
      <person>Elena Stepanova</person>
      <message></message>
    </activity>
    <activity datechanged="2012-06-29T19:21:40.595763+00:00">
      <oldvalue>
<![CDATA[Undecided]]>      </oldvalue>
      <newvalue>
<![CDATA[Medium]]>      </newvalue>
      <whatchanged>maria: importance</whatchanged>
      <person>Elena Stepanova</person>
      <message></message>
    </activity>
    <activity datechanged="2012-06-29T19:21:52.152167+00:00">
      <oldvalue>
<![CDATA[]]>      </oldvalue>
      <newvalue>
<![CDATA[5.2]]>      </newvalue>
      <whatchanged>maria: milestone</whatchanged>
      <person>Elena Stepanova</person>
      <message></message>
    </activity>
  </activities>
  <comments>
    <comment commentlink="https://api.launchpad.net/1.0/maria/+bug/1018694/comments/1" datecreated="2012-06-28T03:49:20.524066+00:00">
      <person>Anton Khalikov</person>
      <subject>
<![CDATA[Re: CPU_TIME counter from USER_STATISTICS gets wrong after some uptime]]>      </subject>
      <content>
<![CDATA[]]>      </content>
    </comment>
    <comment commentlink="https://api.launchpad.net/1.0/maria/+bug/1018694/comments/2" datecreated="2012-06-28T03:49:48.580746+00:00">
      <person>Anton Khalikov</person>
      <subject>
<![CDATA[Re: CPU_TIME counter from USER_STATISTICS gets wrong after some uptime]]>      </subject>
      <content>
<![CDATA[]]>      </content>
    </comment>
    <comment commentlink="https://api.launchpad.net/1.0/maria/+bug/1018694/comments/3" datecreated="2012-06-28T14:56:34.836854+00:00">
      <person>Elena Stepanova</person>
      <subject>
<![CDATA[Re: CPU_TIME counter from USER_STATISTICS gets wrong after some uptime]]>      </subject>
      <content>
<![CDATA[Hi Anton,

Does the counter start showing weird values for all clients simultaneously, or does it happen for one client only?
Is it only CPU_TIME, or other counters go mad too?
Can you provide an example of the data when it _starts_ happening (e.g. two-three snapshots before it shoots up, and two-three snapshots after)? No graphs are necessary, raw data will be fine if it's easier. 

Thanks.]]>      </content>
    </comment>
    <comment commentlink="https://api.launchpad.net/1.0/maria/+bug/1018694/comments/4" datecreated="2012-06-29T04:22:52.865205+00:00">
      <person>Anton Khalikov</person>
      <subject>
<![CDATA[Re: CPU_TIME counter from USER_STATISTICS gets wrong after some uptime]]>      </subject>
      <content>
<![CDATA[Hi Elena,

1. Yes, the counter started to show wrong values for all clients simultaneously. Checked over 10 random weekly clients graphs.
2. Yes, it affects only CPU_TIME counter. 
3. Do you want me to send you a backup (snapshot of database files) of a random database of a random customer? I can do it but I can't provide sql queries they run unfortunately. 

I have about 500 clients on this server and over 1000 on another which was affected too. I am not sure what to send exactly to be honest.]]>      </content>
    </comment>
    <comment commentlink="https://api.launchpad.net/1.0/maria/+bug/1018694/comments/5" datecreated="2012-06-29T13:25:06.747847+00:00">
      <person>Elena Stepanova</person>
      <subject>
<![CDATA[Re: CPU_TIME counter from USER_STATISTICS gets wrong after some uptime]]>      </subject>
      <content>
<![CDATA[Hi Anton,

Just to clarify, it started showing *the same* wrong values for all clients simultaneously, right? 
]]>      </content>
    </comment>
    <comment commentlink="https://api.launchpad.net/1.0/maria/+bug/1018694/comments/6" datecreated="2012-06-29T19:21:21.407159+00:00">
      <person>Elena Stepanova</person>
      <subject>
<![CDATA[Re: CPU_TIME counter from USER_STATISTICS gets wrong after some uptime]]>      </subject>
      <content>
<![CDATA[I suppose it's one of the bunch of bugs filed in regard to broken user stats:

https://bugs.launchpad.net/percona-server/+bug/608027
https://bugs.launchpad.net/percona-server/+bug/924872
https://bugs.launchpad.net/percona-server/+bug/728082

Some of them say that a fix is committed in 5.1, so maybe there is something useful to port. 
Assigning to Sergei to decide. 
]]>      </content>
    </comment>
    <comment commentlink="https://api.launchpad.net/1.0/maria/+bug/1018694/comments/7" datecreated="2012-06-29T19:22:49.296451+00:00">
      <person>Anton Khalikov</person>
      <subject>
<![CDATA[Re: CPU_TIME counter from USER_STATISTICS gets wrong after some uptime]]>      </subject>
      <content>
<![CDATA[Hi Elena,

Yes, the values started to be counted wrong for all clients at the same moment (simultaneously). But no, the values are not the same for everyone if you meant that. It looks like MariaDB at some moment started to multiply real CPU_TIME by a factor but this factor is different for every single client. So those who used to have high cpu usage values before the problem occured became to have incredibly high cpu usage values (take a look at second graph, the maximum there is about 14000% which is just impossible), but not every client had so high values.

If you want to know how we calculate these percents, it's simple math. We flush counters every 5 minutes which means 300 seconds. So if a customer's cpu_time within this 5 minutes interval equals to 300 seconds, it's 100%. If cpu_time within 5 minutes only counted to 30 seconds, its 10%. And so on. If a customer has 600 seconds cpu_time within 5 minutes interval it's 200% and means customer's processes used 2 processor cores at 100% each.

Hope this helped.]]>      </content>
    </comment>
  </comments>
  <messages>
    <message created="2012-06-28 03:49:20.524066+00:00" owner="Anton Khalikov">
<![CDATA[]]>      <attachment link="https://bugs.launchpad.net/bugs/1018694/+attachment/3206286" type="Unspecified">
        <title>Graph 1</title>
        <file>LPexportBug1018694_dbstdusercpu.png</file>
      </attachment>
    </message>
    <message created="2012-06-28 03:49:48.580746+00:00" owner="Anton Khalikov">
<![CDATA[]]>      <attachment link="https://bugs.launchpad.net/bugs/1018694/+attachment/3206287" type="Unspecified">
        <title>Another graph</title>
        <file>LPexportBug1018694_dbstdusercpu2.png</file>
      </attachment>
    </message>
  </messages>
</launchpad-bug>
